You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s) (logmsgbot))
imported>Stashbot
(inflatador: running puppet-merge for https://gerrit.wikimedia.org/r/755810)
Line 1: Line 1:
== 2015-07-18 ==
== 2022-01-20 ==
* 20:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s)
* 22:40 inflatador: running puppet-merge for https://gerrit.wikimedia.org/r/755810
* 20:44 YuviPanda: restarted etherpad
* 22:27 urandom: rolling restart of Cassandra, aqs-next -- [[phab:T298516|T298516]]
* 18:56 akosiaris: reinstall labsdb1004
* 21:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
* 16:36 paravoid: Ganglia is up :)
* 20:58 jhathaway: rebotting mx1001 to test new kernel
* 16:09 Krenair: Ganglia seems down
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:42 Krenair: Doing T44180
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 05:28:25 UTC 2015 (duration 28m 24s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:34 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-18 02:34:29+00:00
* 20:37 urandom: upgrading Cassandra to 3.11.11, aqs1010 -- [[phab:T298516|T298516]]
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 19s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 02:07:38 UTC 2015 (duration 7m 37s)
* 20:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-18 02:03:29+00:00
* 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 00:49 ejegg: restored recurring globalcollect batch size of 250
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:09 ejegg: updated civicrm from 78de1b9b74934984af3099afe9192fa53011bdaa to 292ad137f6b3ffc818a3bd617ca4f335931091f3
* 20:31 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/DiscussionTools/includes/HeadingItem.php: Backport: [[gerrit:755684{{!}}Prevent assertion failure caused by empty headings (T299583)]] (duration: 00m 50s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:38 bd808@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Remove password clear on block (duration: 00m 50s)
* 19:19 jhathaway: rebooting mx1001 to test new kernel
* 19:17 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:14 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 19:13 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 19:11 cjming: end of UTC evening backport & config window
* 19:10 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 19:10 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:08 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755745{{!}}Disable language alert for pilot wikis except thwiki, viwiki. (T295555)]] (duration: 00m 51s)
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/includes/Hooks.php: Backport: [[gerrit:755682{{!}}Do not try to make watchlist collapsible on wikis where watchlist is disabled (T299671)]] (duration: 00m 50s)
* 18:27 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755741 enhancements for the settings benchmark entrypoint (duration: 00m 51s)
* 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
* 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS buster
* 18:17 mutante: running puppet on cp403*
* 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS buster
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
* 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS buster
* 17:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
* 17:18 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755678{{!}}Revert "Make Block objects aware of which wiki they belong to"]] (duration: 00m 55s)
* 17:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 17:15 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1008.eqiad.wmnet with OS buster
* 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS buster
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:04 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 17:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2020.codfw.wmnet with OS buster
* 17:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:55 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS buster
* 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
* 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:50 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755399 add temporary entrypoint for settings benchmark (duration: 00m 50s)
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
* 16:48 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
* 16:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS buster
* 15:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS buster
* 15:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 15:46 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:43 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:20 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 15:16 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
* 15:14 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 15:13 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:12 moritzm: enabled hardware virtualisation in BIOS for ganeti1028 [[phab:T293909|T293909]]
* 15:11 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 15:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
* 15:05 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:05 moritzm: enabled hardware virtualisation in BIOS for ganeti1027 [[phab:T293909|T293909]]
* 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 14:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
* 14:56 moritzm: enabled hardware virtualisation in BIOS for ganeti1026 [[phab:T293909|T293909]]
* 14:55 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 11s)
* 14:55 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
* 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:20 moritzm: enabled hardware virtualisation in BIOS for ganeti1023 [[phab:T283036|T283036]]
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
* 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
* 14:03 moritzm: enabled hardware virtualisation in BIOS for ganeti1024 [[phab:T283036|T283036]]
* 13:55 marostegui: Power off es1022 for onsite maintenance [[phab:T299123|T299123]]
* 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
* 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
* 13:51 moritzm: enabled hardware virtualisation in BIOS for ganeti1025 [[phab:T293909|T293909]]
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
* 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
* 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/CentralNotice/includes/: Backport: [[gerrit:755670{{!}}Replace remaining usages of IDatabase::fetchObject()/::numRows() (T286694)]] (duration: 00m 50s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:03 Lucas_WMDE: UTC morning backport window done
* 13:02 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/deferred/LinksUpdate/LinksUpdate.php: Backport: [[gerrit:755668{{!}}Fix deprecation warning from LinksUpdate::getImages() (T299472)]] (duration: 00m 50s)
* 13:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/maintenance/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (2/2) (duration: 00m 50s)
* 13:00 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (1/2) (duration: 00m 56s)
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755322{{!}}Enable usage tracking for statements in Waray Wikipedia (T296383)]] (expecting some gradual increase of wbc_entity_usage rows on warwiki) (duration: 00m 51s)
* 12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18943 and previous config saved to /var/cache/conftool/dbconfig/20220120-121520-marostegui.json
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
* 12:10 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
* 12:09 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
* 12:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
* 12:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
* 12:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18942 and previous config saved to /var/cache/conftool/dbconfig/20220120-120015-marostegui.json
* 11:49 moritzm: add ganeti1024 to Ganeti eqiad cluster [[phab:T283036|T283036]]
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18941 and previous config saved to /var/cache/conftool/dbconfig/20220120-114510-marostegui.json
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
* 11:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18940 and previous config saved to /var/cache/conftool/dbconfig/20220120-113006-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18939 and previous config saved to /var/cache/conftool/dbconfig/20220120-112854-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18938 and previous config saved to /var/cache/conftool/dbconfig/20220120-112846-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:24 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
* 11:23 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
* 11:23 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
* 11:22 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
* 11:21 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:19 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
* 11:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 11:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:18 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
* 11:18 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
* 11:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
* 11:13 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 11:13 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18937 and previous config saved to /var/cache/conftool/dbconfig/20220120-111341-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18936 and previous config saved to /var/cache/conftool/dbconfig/20220120-105837-marostegui.json
* 10:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 10:52 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1018.eqiad.wmnet with OS buster
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18935 and previous config saved to /var/cache/conftool/dbconfig/20220120-104332-marostegui.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18934 and previous config saved to /var/cache/conftool/dbconfig/20220120-104220-marostegui.json
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18933 and previous config saved to /var/cache/conftool/dbconfig/20220120-104206-marostegui.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18932 and previous config saved to /var/cache/conftool/dbconfig/20220120-102702-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18931 and previous config saved to /var/cache/conftool/dbconfig/20220120-101157-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18930 and previous config saved to /var/cache/conftool/dbconfig/20220120-095652-marostegui.json
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 09:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1018.eqiad.wmnet with OS buster
* 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18929 and previous config saved to /var/cache/conftool/dbconfig/20220120-092232-marostegui.json
* 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18928 and previous config saved to /var/cache/conftool/dbconfig/20220120-092225-marostegui.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18927 and previous config saved to /var/cache/conftool/dbconfig/20220120-091127-root.json
* 09:09 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18926 and previous config saved to /var/cache/conftool/dbconfig/20220120-090720-marostegui.json
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18925 and previous config saved to /var/cache/conftool/dbconfig/20220120-085623-root.json
* 08:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18924 and previous config saved to /var/cache/conftool/dbconfig/20220120-085215-marostegui.json
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18923 and previous config saved to /var/cache/conftool/dbconfig/20220120-084120-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18922 and previous config saved to /var/cache/conftool/dbconfig/20220120-083711-marostegui.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18921 and previous config saved to /var/cache/conftool/dbconfig/20220120-083558-marostegui.json
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18920 and previous config saved to /var/cache/conftool/dbconfig/20220120-083520-marostegui.json
* 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18919 and previous config saved to /var/cache/conftool/dbconfig/20220120-082616-root.json
* 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18918 and previous config saved to /var/cache/conftool/dbconfig/20220120-082015-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for on-site maintenance [[phab:T299123|T299123]]', diff saved to https://phabricator.wikimedia.org/P18917 and previous config saved to /var/cache/conftool/dbconfig/20220120-081809-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18916 and previous config saved to /var/cache/conftool/dbconfig/20220120-081112-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18915 and previous config saved to /var/cache/conftool/dbconfig/20220120-080510-marostegui.json
* 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 07:57 marostegui: Stop mysql on db1117 to clone db1128 [[phab:T299344|T299344]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18913 and previous config saved to /var/cache/conftool/dbconfig/20220120-075609-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18912 and previous config saved to /var/cache/conftool/dbconfig/20220120-075005-marostegui.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18911 and previous config saved to /var/cache/conftool/dbconfig/20220120-074753-marostegui.json
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18910 and previous config saved to /var/cache/conftool/dbconfig/20220120-074746-marostegui.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18909 and previous config saved to /var/cache/conftool/dbconfig/20220120-074105-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18908 and previous config saved to /var/cache/conftool/dbconfig/20220120-073241-marostegui.json
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18907 and previous config saved to /var/cache/conftool/dbconfig/20220120-072558-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18906 and previous config saved to /var/cache/conftool/dbconfig/20220120-071736-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18905 and previous config saved to /var/cache/conftool/dbconfig/20220120-071054-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18904 and previous config saved to /var/cache/conftool/dbconfig/20220120-070231-marostegui.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18903 and previous config saved to /var/cache/conftool/dbconfig/20220120-070119-marostegui.json
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18902 and previous config saved to /var/cache/conftool/dbconfig/20220120-070052-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18901 and previous config saved to /var/cache/conftool/dbconfig/20220120-065551-root.json
* 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1180.eqiad.wmnet with OS bullseye
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18900 and previous config saved to /var/cache/conftool/dbconfig/20220120-064547-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18899 and previous config saved to /var/cache/conftool/dbconfig/20220120-063042-marostegui.json
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1180.eqiad.wmnet with OS bullseye
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18898 and previous config saved to /var/cache/conftool/dbconfig/20220120-061538-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P18897 and previous config saved to /var/cache/conftool/dbconfig/20220120-061529-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18896 and previous config saved to /var/cache/conftool/dbconfig/20220120-061407-marostegui.json
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance


== 2015-07-17 ==
== 2022-01-19 ==
* 21:51 ejegg: updated civicrm from 0acac037ce0c9a64e94a475463deb2d47e84193a to 78de1b9b74934984af3099afe9192fa53011bdaa
* 23:36 mutante: deploy1002 - checked freshly generated cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml  with 'openssl x509 -noout -text -in .. {{!}} grep DNS'. now has static-bz on it. ([[phab:T281538|T281538]])
* 20:53 matt_flaschen: Manually fixed issue in mediawikiwiki LQT thread table with rename of Ecliptica to Entropy. https://phabricator.wikimedia.org/T106122#1461380
* 23:35 mutante: puppetmaster1001 - revoked puppet cert miscweb.discovery.wmnet; updated kube_services.crts.yaml to include static-bugzilla.wikimedia.org, removed miscweb.discovery.wmnet.crt and .csr.pem, used cergen to check and regenerate cert, committed in private repo, ran puppet on deploy1001 - checked cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml  with 'openssl x509
* 20:03 hashar: stopping Zuul to get rid of a faulty registered function "build:Global-Dev Dashboard Data". Job is gone already.
* 21:43 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 26s)
* 17:50 ejegg: updated civicrm from fa724dd2e2e69545d81015c943cb7f52cf6de8e1 to 0acac037ce0c9a64e94a475463deb2d47e84193a
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 16:49 gwicke: restarted restbase on restbase1001
* 20:52 Krinkle: depool mw1340 (api_appserver) for performance and php-apcu testing
* 15:04 gwicke: restarted RB thinner scripts, see https://phabricator.wikimedia.org/T105706
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:10 urandom: restart restbase service on restbase1006
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:07 urandom: restart restbase service on restbase1003
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:05 urandom: restart restbase service on restbase1002
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:56 godog: apache2ctl graceful on fluorine antimony argon caesium helium
* 20:09 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]] (duration: 00m 49s)
* 13:43 godog: apache2ctl graceful on netmon1001
* 20:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 11:24 hashar: rebooted labnodepool1001.eqiad.wmnet . Accidentally deleted the whole /dev which freeze everything :(
* 20:04 jhathaway: rebooting mx1001 to debug conntrack
* 10:21 _joe_: repooling mw1158
* 19:52 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/tests/phpunit/structure/SettingsTest.php: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 00m 49s)
* 09:08 _joe_: depooling mw1158, repooling mw1156,7
* 19:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 01m 02s)
* 07:51 _joe_: depooled mw1156,7 for reimaging
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1009.eqiad.wmnet
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 04:53:56 UTC 2015 (duration 53m 55s)
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1008.eqiad.wmnet
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1030 (duration: 00m 12s)
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1007.eqiad.wmnet
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-17 02:30:03+00:00
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2006.codfw.wmnet
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 05m 55s)
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2005.codfw.wmnet
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 02:07:22 UTC 2015 (duration 7m 20s)
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2004.codfw.wmnet
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-17 02:03:12+00:00
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:30 mutante: git pull origin on strontium
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
* 19:31 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS buster
* 19:17 cjming@deploy1002: Synchronized wmf-config/config: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:13 cjming@deploy1002: Synchronized wmf-config/config: message (duration: 00m 50s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:12 cjming@deploy1002: Synchronized wmf-config/config/foundationwiki.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 cjming@deploy1002: Synchronized wmf-config/config/viwiki.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 cjming@deploy1002: Synchronized wmf-config/config/ptwikinews.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 50s)
* 19:09 cjming@deploy1002: Synchronized dblists/desktop-improvements.dblist: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 01m 09s)
* 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18893 and previous config saved to /var/cache/conftool/dbconfig/20220119-190137-ladsgroup.json
* 18:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18892 and previous config saved to /var/cache/conftool/dbconfig/20220119-184632-ladsgroup.json
* 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18891 and previous config saved to /var/cache/conftool/dbconfig/20220119-183128-ladsgroup.json
* 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18890 and previous config saved to /var/cache/conftool/dbconfig/20220119-181623-ladsgroup.json
* 18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1110.eqiad.wmnet
* 18:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 18:09 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1110.eqiad.wmnet
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18889 and previous config saved to /var/cache/conftool/dbconfig/20220119-180840-ladsgroup.json
* 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18888 and previous config saved to /var/cache/conftool/dbconfig/20220119-180154-ladsgroup.json
* 17:58 herron: beginning logstash apifeatureusage switchover [[phab:T297239|T297239]]
* 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:54 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:52 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 17:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:575390{{!}}[wikitech] Drop the cloudadmin user group, no longer used and empty (T237890)]] (duration: 00m 50s)
* 17:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:47 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754999{{!}}Disable UserMerge (T216089)]] (duration: 00m 54s)
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18887 and previous config saved to /var/cache/conftool/dbconfig/20220119-174650-ladsgroup.json
* 17:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:42 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754998{{!}}Drop CentralAuthUserMerge log channel (T216089)]] (duration: 01m 05s)
* 17:36 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:35 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18886 and previous config saved to /var/cache/conftool/dbconfig/20220119-173145-ladsgroup.json
* 17:26 _joe_: powercycling contint1001 via ipmi, [[phab:T299542|T299542]]
* 17:25 cmjohnson1: updating firmware, ganeti1018 [[phab:T299527|T299527]]
* 17:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:18 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS buster
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18885 and previous config saved to /var/cache/conftool/dbconfig/20220119-171640-ladsgroup.json
* 16:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 16:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
* 16:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS buster
* 16:48 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:36 hashar: marking contint1001.wikimedia.org as offline in Jenkins since it is dramatically overloaded [[phab:T299542|T299542]]
* 16:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18883 and previous config saved to /var/cache/conftool/dbconfig/20220119-162717-marostegui.json
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18882 and previous config saved to /var/cache/conftool/dbconfig/20220119-161212-marostegui.json
* 16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS buster
* 16:00 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[134].codfw.wmnet
* 15:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS buster
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18881 and previous config saved to /var/cache/conftool/dbconfig/20220119-155706-marostegui.json
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:48 moritzm: installing tiff security updates on stretch
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18879 and previous config saved to /var/cache/conftool/dbconfig/20220119-154201-marostegui.json
* 15:40 mmandere: cp5005,cp4025: upgrade varnish to 6.0.9 [[phab:T298758|T298758]]
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18878 and previous config saved to /var/cache/conftool/dbconfig/20220119-154046-marostegui.json
* 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18877 and previous config saved to /var/cache/conftool/dbconfig/20220119-154039-marostegui.json
* 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18876 and previous config saved to /var/cache/conftool/dbconfig/20220119-152534-marostegui.json
* 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
* 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS buster
* 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18875 and previous config saved to /var/cache/conftool/dbconfig/20220119-151029-marostegui.json
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS buster
* 15:07 jbond: updating lldp parent fact
* 15:01 moritzm: migrate primary/secondary instances off ganeti1022
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1018.eqiad.wmnet with OS buster
* 14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18873 and previous config saved to /var/cache/conftool/dbconfig/20220119-145525-marostegui.json
* 14:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18872 and previous config saved to /var/cache/conftool/dbconfig/20220119-145410-marostegui.json
* 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18871 and previous config saved to /var/cache/conftool/dbconfig/20220119-145402-marostegui.json
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18870 and previous config saved to /var/cache/conftool/dbconfig/20220119-143858-marostegui.json
* 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:33 jayme: disabled insecure API on all k8s masters - [[phab:T290967|T290967]]
* 14:33 mmandere: esams: upgrade varnish to 6.0.9 [[phab:T298758|T298758]]
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 14:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS buster
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18869 and previous config saved to /var/cache/conftool/dbconfig/20220119-142353-marostegui.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18868 and previous config saved to /var/cache/conftool/dbconfig/20220119-140848-marostegui.json
* 14:04 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db1100.eqiad.wmnet
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18867 and previous config saved to /var/cache/conftool/dbconfig/20220119-140433-marostegui.json
* 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18866 and previous config saved to /var/cache/conftool/dbconfig/20220119-140419-marostegui.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18865 and previous config saved to /var/cache/conftool/dbconfig/20220119-134915-marostegui.json
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:36 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1100.eqiad.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18864 and previous config saved to /var/cache/conftool/dbconfig/20220119-133514-ladsgroup.json
* 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18863 and previous config saved to /var/cache/conftool/dbconfig/20220119-133410-marostegui.json
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:26 hashar: Restarting Gerrit
* 13:24 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # [[phab:T299451|T299451]] (duration: 00m 08s)
* 13:24 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # [[phab:T299451|T299451]]
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:22 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.16 (duration: 01m 32s)
* 13:20 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.12 (duration: 01m 43s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:19 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.12` apparently missed in previous train  # [[phab:T293958|T293958]] [[phab:T293959|T293959]]
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18862 and previous config saved to /var/cache/conftool/dbconfig/20220119-131905-marostegui.json
* 13:18 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.13 (duration: 03m 11s)
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18861 and previous config saved to /var/cache/conftool/dbconfig/20220119-131750-marostegui.json
* 13:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 13:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18860 and previous config saved to /var/cache/conftool/dbconfig/20220119-131743-marostegui.json
* 13:16 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.13` apparently missed in previous train  # [[phab:T293958|T293958]] [[phab:T293959|T293959]]
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:13 Lucas_WMDE: UTC morning backport+config window done
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:08 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 2) (duration: 29m 08s)
* 13:05 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u www-data rm /tmp/URL*.urlupload_ # save space
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18859 and previous config saved to /var/cache/conftool/dbconfig/20220119-130238-marostegui.json
* 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from dbctl [[phab:T299344|T299344]]', diff saved to https://phabricator.wikimedia.org/P18858 and previous config saved to /var/cache/conftool/dbconfig/20220119-125658-marostegui.json
* 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1155.eqiad.wmnet with OS bullseye
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18857 and previous config saved to /var/cache/conftool/dbconfig/20220119-124733-marostegui.json
* 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:38 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 2)
* 12:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/MediaSearch/extension.json: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 1) (duration: 01m 34s)
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18856 and previous config saved to /var/cache/conftool/dbconfig/20220119-123229-marostegui.json
* 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/Flow/modules/flow/ui/widgets/mw.flow.ui.TopicMenuSelectWidget.js: Backport: [[gerrit:754921{{!}}Fix TopicMenuSelectWidget after OOUI change (T299473)]] (duration: 01m 08s)
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18855 and previous config saved to /var/cache/conftool/dbconfig/20220119-123114-marostegui.json
* 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18854 and previous config saved to /var/cache/conftool/dbconfig/20220119-123106-marostegui.json
* 12:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[12].codfw.wmnet
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1155.eqiad.wmnet with OS bullseye
* 12:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2012.codfw.wmnet with OS buster
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18853 and previous config saved to /var/cache/conftool/dbconfig/20220119-121602-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18852 and previous config saved to /var/cache/conftool/dbconfig/20220119-120057-marostegui.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18851 and previous config saved to /var/cache/conftool/dbconfig/20220119-114949-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18850 and previous config saved to /var/cache/conftool/dbconfig/20220119-114944-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18849 and previous config saved to /var/cache/conftool/dbconfig/20220119-114552-marostegui.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18848 and previous config saved to /var/cache/conftool/dbconfig/20220119-114237-marostegui.json
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18847 and previous config saved to /var/cache/conftool/dbconfig/20220119-114154-marostegui.json
* 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2012.codfw.wmnet with OS buster
* 11:35 moritzm: rebalance ganeti group D in codfw after adding ganeti2026 [[phab:T282603|T282603]]
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18846 and previous config saved to /var/cache/conftool/dbconfig/20220119-113445-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18845 and previous config saved to /var/cache/conftool/dbconfig/20220119-113440-root.json
* 11:32 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 18m 27s)
* 11:28 godog: bounce superset on an-tool1005 - [[phab:T299383|T299383]]
* 11:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2011.codfw.wmnet with OS buster
* 11:28 godog: bounce superset on an-tool1010 - [[phab:T299383|T299383]]
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18844 and previous config saved to /var/cache/conftool/dbconfig/20220119-112649-marostegui.json
* 11:26 godog: bounce navtiming on webperf1001 - [[phab:T299383|T299383]]
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18843 and previous config saved to /var/cache/conftool/dbconfig/20220119-111942-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18842 and previous config saved to /var/cache/conftool/dbconfig/20220119-111937-root.json
* 11:15 moritzm: add ganeti2026 to Ganeti codfw cluster [[phab:T282603|T282603]]
* 11:14 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
* 11:12 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001 (duration: 01m 00s)
* 11:12 filippo@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:754879{{!}}Revert "ProductionServices: use graphite2003 for statsd" (T299383)]] (duration: 02m 09s)
* 11:11 oblivian@deploy1002: Started deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18840 and previous config saved to /var/cache/conftool/dbconfig/20220119-111144-marostegui.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18839 and previous config saved to /var/cache/conftool/dbconfig/20220119-110438-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18838 and previous config saved to /var/cache/conftool/dbconfig/20220119-110433-root.json
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:58 godog: flip graphite back to eqiad - [[phab:T299383|T299383]]
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18837 and previous config saved to /var/cache/conftool/dbconfig/20220119-105640-marostegui.json
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18836 and previous config saved to /var/cache/conftool/dbconfig/20220119-105523-marostegui.json
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18835 and previous config saved to /var/cache/conftool/dbconfig/20220119-104934-root.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18834 and previous config saved to /var/cache/conftool/dbconfig/20220119-104929-root.json
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
* 10:42 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18833 and previous config saved to /var/cache/conftool/dbconfig/20220119-104109-marostegui.json
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2011.codfw.wmnet with OS buster
* 10:40 ayounsi@deploy1002: Finished deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0 (duration: 01m 26s)
* 10:39 ayounsi@deploy1002: Started deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0
* 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2010.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18832 and previous config saved to /var/cache/conftool/dbconfig/20220119-103431-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18831 and previous config saved to /var/cache/conftool/dbconfig/20220119-103425-root.json
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18830 and previous config saved to /var/cache/conftool/dbconfig/20220119-102604-marostegui.json
* 10:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 10:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18829 and previous config saved to /var/cache/conftool/dbconfig/20220119-101927-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18828 and previous config saved to /var/cache/conftool/dbconfig/20220119-101922-root.json
* 10:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply on production
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:15 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync on production
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18827 and previous config saved to /var/cache/conftool/dbconfig/20220119-101100-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18826 and previous config saved to /var/cache/conftool/dbconfig/20220119-100424-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18825 and previous config saved to /var/cache/conftool/dbconfig/20220119-100418-root.json
* 10:03 hashar: Upgraded gerrit-replica.wikimedia.org from 3.3.6 to 3.3.9
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18824 and previous config saved to /var/cache/conftool/dbconfig/20220119-095555-marostegui.json
* 09:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # [[phab:T299451|T299451]] (duration: 00m 09s)
* 09:54 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # [[phab:T299451|T299451]]
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18823 and previous config saved to /var/cache/conftool/dbconfig/20220119-095428-marostegui.json
* 09:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18822 and previous config saved to /var/cache/conftool/dbconfig/20220119-095421-marostegui.json
* 09:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 09:49 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18821 and previous config saved to /var/cache/conftool/dbconfig/20220119-094920-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18820 and previous config saved to /var/cache/conftool/dbconfig/20220119-094914-root.json
* 09:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply on staging
* 09:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply on production
* 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync on production
* 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply on staging
* 09:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply on production
* 09:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync on staging
* 09:43 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply on production
* 09:43 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply on staging
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18819 and previous config saved to /var/cache/conftool/dbconfig/20220119-093915-marostegui.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18818 and previous config saved to /var/cache/conftool/dbconfig/20220119-093416-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18817 and previous config saved to /var/cache/conftool/dbconfig/20220119-093411-root.json
* 09:32 XioNoX: enable v6 BGP to HE in eqiad for testing
* 09:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1098.eqiad.wmnet with OS bullseye
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18816 and previous config saved to /var/cache/conftool/dbconfig/20220119-092410-marostegui.json
* 09:20 moritzm: migrate primary/secondary instances off ganeti1018
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18813 and previous config saved to /var/cache/conftool/dbconfig/20220119-090905-marostegui.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18812 and previous config saved to /var/cache/conftool/dbconfig/20220119-090839-marostegui.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18811 and previous config saved to /var/cache/conftool/dbconfig/20220119-090832-marostegui.json
* 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1098.eqiad.wmnet with OS bullseye
* 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2129.codfw.wmnet with OS bullseye
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 (s6,s7) for Bullseye reimage [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P18809 and previous config saved to /var/cache/conftool/dbconfig/20220119-085927-marostegui.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18808 and previous config saved to /var/cache/conftool/dbconfig/20220119-085327-marostegui.json
* 08:50 XioNoX: disable v6 BGP to HE in eqiad for testing
* 08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production
* 08:45 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply on staging
* 08:45 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply on production
* 08:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production
* 08:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply on staging
* 08:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply on production
* 08:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on staging
* 08:39 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:39 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18807 and previous config saved to /var/cache/conftool/dbconfig/20220119-083822-marostegui.json
* 08:35 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:35 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:33 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2076.codfw.wmnet with OS bullseye
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2129.codfw.wmnet with OS bullseye
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2114.codfw.wmnet with OS bullseye
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18806 and previous config saved to /var/cache/conftool/dbconfig/20220119-082318-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18805 and previous config saved to /var/cache/conftool/dbconfig/20220119-081650-marostegui.json
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18804 and previous config saved to /var/cache/conftool/dbconfig/20220119-081643-marostegui.json
* 08:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2010.codfw.wmnet with OS buster
* 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18803 and previous config saved to /var/cache/conftool/dbconfig/20220119-080138-marostegui.json
* 07:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2114.codfw.wmnet with OS bullseye
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2076.codfw.wmnet with OS bullseye
* 07:52 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 07:52 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2124.codfw.wmnet with OS bullseye
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2117.codfw.wmnet with OS bullseye
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18802 and previous config saved to /var/cache/conftool/dbconfig/20220119-074633-marostegui.json
* 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2089.codfw.wmnet with OS bullseye
* 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2095.codfw.wmnet with OS bullseye
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18801 and previous config saved to /var/cache/conftool/dbconfig/20220119-073129-marostegui.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18800 and previous config saved to /var/cache/conftool/dbconfig/20220119-072301-marostegui.json
* 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18799 and previous config saved to /var/cache/conftool/dbconfig/20220119-072253-marostegui.json
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2124.codfw.wmnet with OS bullseye
* 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2117.codfw.wmnet with OS bullseye
* 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2089.codfw.wmnet with OS bullseye
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18797 and previous config saved to /var/cache/conftool/dbconfig/20220119-070749-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18796 and previous config saved to /var/cache/conftool/dbconfig/20220119-065318-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18795 and previous config saved to /var/cache/conftool/dbconfig/20220119-065244-marostegui.json
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2095.codfw.wmnet with OS bullseye
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18794 and previous config saved to /var/cache/conftool/dbconfig/20220119-063739-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18793 and previous config saved to /var/cache/conftool/dbconfig/20220119-063613-marostegui.json
* 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18792 and previous config saved to /var/cache/conftool/dbconfig/20220119-063605-marostegui.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18791 and previous config saved to /var/cache/conftool/dbconfig/20220119-062100-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18790 and previous config saved to /var/cache/conftool/dbconfig/20220119-060555-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18789 and previous config saved to /var/cache/conftool/dbconfig/20220119-055051-marostegui.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18788 and previous config saved to /var/cache/conftool/dbconfig/20220119-054924-marostegui.json
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 01:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753192{{!}}DiscussionTools: Use bullet indentation on ruwiki (T259864)]] (duration: 00m 53s)
* 01:05 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753543{{!}}[wmf-config] Deploy the cawiki test safety survey to production. (T296657)]] (duration: 00m 53s)
* 01:02 catrope@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/DiscussionTools: Backport: [[gerrit:754915{{!}}Enable wikis to customize the syntax used for replies (T259864)]] and [[gerrit:754916{{!}}Ensure the marker appears in a reasonable place when replying with a bullet (T259864)]] (duration: 00m 53s)
* 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/AbuseFilter/: Backport: [[gerrit:754917{{!}}Don't use array keys for OOUI (T299463)]] and [[gerrit:754918{{!}}Don't use array keys for OOUI in AbuseFilterViewDiff (T299463)]] (duration: 00m 54s)
* 00:49 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754054{{!}}Change TheWikipediaLibrary editcount (T288070)]] (duration: 00m 53s)
* 00:38 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:752308{{!}}Use namespaced CentralAuthUser (T298840)]] (duration: 00m 54s)
* 00:35 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754914{{!}}Revert "commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist"]] (duration: 00m 54s)
* 00:33 WFan: re-enable the disabled jobs for civicrm upgrade
* 00:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755026{{!}}azwiki: Change alias Q to QA for the draft namespace (T299332)]] (duration: 00m 53s)
* 00:08 WFan: Upgrade CiviCrm from gerrit #755044
* 00:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755018{{!}}fawiki: Exempt userspaces from being indexed by search engines (T299363)]] (duration: 00m 54s)
* 00:00 WFan: disabling jobs for civiCrm upgrade


== 2015-07-16 ==
== 2022-01-18 ==
* 21:27 ori: bounced nutcracker on mw1139 as well. hashar noticed flood of errors from these hosts on https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors . lack of monitoring / alerts is troubling.
* 23:11 jhathaway: rebooting mx1001 to revert to the old kernel
* 21:26 ori: bounced nutcracker on mw1128 and mw1134
* 22:59 sbassett: Deployed security patch for [[phab:T298434|T298434]] to 1.38.0-wmf.18
* 20:50 mutante: iegreview tool - short maintenance downtime
* 22:57 sbassett: Deployed security patch for [[phab:T298434|T298434]] to 1.380-wmf.17
* 19:39 YuviPanda: imported aspell-id from ubuntu to jessie-wikimedia - needed by ores, simple package that I am not sure why it is not in jessie
* 21:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 19:20 logmsgbot: twentyafterfour Synchronized php-1.26wmf14/includes/db/LoadMonitor.php: Deploying Hotfix for T105373 (duration: 00m 13s)
* 21:29 jhuneidi@deploy1002: Finished scap: testwikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]] (duration: 38m 31s)
* 18:40 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf14
* 21:26 jhathaway: rebooting mx1001, to test new kernel
* 18:26 ejegg: changed batch size from 250 to 1 in RGC jenkins job
* 20:50 jhuneidi@deploy1002: Started scap: testwikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]]
* 18:22 ejegg: updated civicrm from 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7 to fa724dd2e2e69545d81015c943cb7f52cf6de8e1
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:56 Jeff_Green: authdns update to rename lutetium.wm.o
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:08 hashar_: kept nodepool stopped on labnodepool1001.eqiad.wmnet because it spams the cron log
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:57 logmsgbot: demon Synchronized multiversion/MWMultiVersion.php: prod no-op, beta change (duration: 00m 13s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/224975/ (duration: 00m 12s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf14/extensions/Math/MathMathML.php: SWAT: Fix: Undefined variable passed hook [[gerrit:225058]] (duration: 00m 12s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:03 ejegg: updated payments from 4ca95d55a9745c05ccfbb16ee6f23a6f75328824 to ebb1a9e52172a4793cf5feb33220b4d7edfcad70
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:21 dcausse: es1.6 upgrade: all done
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:32 dcausse: restarted gmond on elastic1024
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:06 mobrovac: citoid deploying ff90869
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:56 dcausse: es1.6 upgrade: upgrade elastic1031
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:25 mobrovac: citoid rolled back to ffbaf6d
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:10 mobrovac: citoid deploying 5aeb0fc
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:05 dcausse: es1.6 upgrade: upgrade elastic1030
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:38 dcausse: es1.6 upgrade: upgrade elastic1029
* 18:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:42 dcausse: es1.6 upgrade: upgrade elastic1028
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:31 dcausse: es1.6 upgrade: upgrade elastic1027
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 07:22:49 UTC 2015 (duration 22m 48s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ff5874469b717cba38ed7cff0669754517a3553}}: pwnwiki: Deploy Growth features to newcomers ([[phab:T298115|T298115]]) (duration: 02m 14s)
* 05:53 dcausse: es1.6 upgrade: upgrade elastic1026
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:31 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:24 logmsgbot: krenair Synchronized php-1.26wmf14/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225008/ (duration: 00m 13s)
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 04:38 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225006/ (duration: 00m 13s)
* 17:57 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 13hours)
* 03:54 manybubbles: es1.6 upgrade: upgrade elastic1025
* 17:37 hashar: restarted zuul on contint2001
* 03:19 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-16 03:19:37+00:00
* 17:16 moritzm: installing gmp security updates
* 03:13 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 10m 23s)
* 16:53 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 02:46 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-16 02:46:03+00:00
* 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 02:43 manybubbles: es1.6 upgrade: upgrade elastic1024
* 16:52 hashar: contint2001: restarted ferm service
* 02:39 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 10m 50s)
* 16:49 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 02:07:55 UTC 2015 (duration 7m 54s)
* 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-16 02:03:31+00:00
* 16:47 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2010.codfw.wmnet with OS buster
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-16 02:03:30+00:00
* 16:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 01:41 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/214981/ (duration: 00m 12s)
* 16:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 01:22 manybubbles: es1.6 upgrade: upgrade elastic1023
* 16:14 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 16:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:03 moritzm: installing xen security updates on buster (client-side libraries)
* 15:59 hashar: Shutting down CI for maintenance on contint2001  # [[phab:T283582|T283582]]
* 15:54 godog: update kartotherian certs on maps hosts and roll-reload nginx - [[phab:T297604|T297604]]
* 15:54 moritzm: installing libssh2 security updates on stretch
* 15:50 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
* 15:50 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:47 andrewbogott: resizing the wikitech-static host for [[phab:T298052|T298052]]
* 15:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 02s)
* 15:45 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:35 godog: regenerate kartotherian certs via cergen - [[phab:T297604|T297604]]
* 14:33 kormat: Deploying wmfmariadbpy 0.8 [[phab:T299406|T299406]]
* 14:33 kormat: uploaded wmfmariadbpy 0.8 to apt.wm.o
* 14:31 moritzm: installing rsync security updates on stretch
* 14:28 moritzm: installing xorg-server security updates on stretch
* 14:10 moritzm: installing vim security updates on stretch
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18780 and previous config saved to /var/cache/conftool/dbconfig/20220118-140540-marostegui.json
* 13:55 XioNoX: update grafana-plugins on grafana hosts - [[phab:T251184|T251184]]
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18779 and previous config saved to /var/cache/conftool/dbconfig/20220118-135036-marostegui.json
* 13:46 XioNoX: add grafana-plugins 0.3 (with worldmap plugin) to reprepo
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18778 and previous config saved to /var/cache/conftool/dbconfig/20220118-133531-marostegui.json
* 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:26 Lucas_WMDE: UTC morning backport window done
* 13:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:754605{{!}}Monitoring: Add '.Save' to distinguish from '.Click' events (T286366)]] (duration: 00m 54s)
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18777 and previous config saved to /var/cache/conftool/dbconfig/20220118-132026-marostegui.json
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:14 moritzm: installing python-babel security updates on buster
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18776 and previous config saved to /var/cache/conftool/dbconfig/20220118-131215-marostegui.json
* 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18775 and previous config saved to /var/cache/conftool/dbconfig/20220118-131208-marostegui.json
* 13:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
* 13:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
* 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:04 ayounsi@deploy1002: Finished deploy [homer/deploy@0f02386]: update requirements (duration: 01m 27s)
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:02 ayounsi@deploy1002: Started deploy [homer/deploy@0f02386]: update requirements
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753969{{!}}fawiki: Add flow-delete right to eliminators (T299223)]] (duration: 00m 51s)
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18774 and previous config saved to /var/cache/conftool/dbconfig/20220118-125703-marostegui.json
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:52 moritzm: installing ghostcript security updates for stretch
* 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754613{{!}}azwiki: Add draft namespace (T299332)]] (duration: 00m 51s)
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18773 and previous config saved to /var/cache/conftool/dbconfig/20220118-124159-marostegui.json
* 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: [[gerrit:754129{{!}}Post-edit dialog: Reload page upon dialog closing for structured tasks (T299188)]] (duration: 00m 51s)
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754612{{!}}commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist (T299247)]] (duration: 00m 51s)
* 12:27 moritzm: imported docker-report bullseye rebuild to apt.wikimedia.org [[phab:T298463|T298463]]
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18772 and previous config saved to /var/cache/conftool/dbconfig/20220118-122654-marostegui.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18771 and previous config saved to /var/cache/conftool/dbconfig/20220118-122546-marostegui.json
* 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18770 and previous config saved to /var/cache/conftool/dbconfig/20220118-122538-marostegui.json
* 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18769 and previous config saved to /var/cache/conftool/dbconfig/20220118-121034-marostegui.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18768 and previous config saved to /var/cache/conftool/dbconfig/20220118-115529-marostegui.json
* 11:46 hashar: Rolled back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # [[phab:T299389|T299389]]
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18767 and previous config saved to /var/cache/conftool/dbconfig/20220118-114024-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18766 and previous config saved to /var/cache/conftool/dbconfig/20220118-113916-marostegui.json
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:28 Amir1: mwscript findBadBlobs.php --wiki=dewiki --revisions {{Gerrit|5730218}} --mark "[[phab:T299387|T299387]]"
* 11:06 moritzm: running gnt-cluster renew-crypto --new-node-certificates for ganeti/eqiad cluster following 2.16 update
* 11:06 mmandere: start rolling upgrade to varnish 6.0.9 [[phab:T298758|T298758]]
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1117.eqiad.wmnet with OS bullseye
* 10:46 moritzm: gnt-cluster upgrade --to 2.16  for ganeti/eqiad cluster
* 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1117.eqiad.wmnet with OS bullseye
* 10:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:01 moritzm: running gnt-cluster renew-crypto --new-cluster-certificate --new-rapi-certificate --new-spice-certificate for ganeti/eqiad cluster
* 10:00 marostegui: Move pc1014 to pc3 [[phab:T299046|T299046]]
* 09:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc2 [[phab:T299046|T299046]] (duration: 00m 50s)
* 09:50 taavi: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "QuiteUnusual" "MarcGarver" # [[phab:T298707|T298707]]
* 09:50 moritzm: installing ganeti 2.16.0-1~bpo9+1+wmf1 on ganeti/eqiad servers [[phab:T296721|T296721]]
* 09:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:41 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:752344{{!}}Enable temporary global user groups on production (T153815)]] (duration: 00m 51s)
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes: Backport: [[gerrit:754602{{!}}page: Use MainObjectStash instead of 'db-replicated' cache (T272512)]] (duration: 00m 56s)
* 09:31 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/extension.json: Backport: [[gerrit:754145{{!}}Disable "inline-media-caption" category (T297443)]] (duration: 00m 51s)
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:06 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes/watcheditem/WatchedItemStore.php: Backport: [[gerrit:754599{{!}}watcheditem: Try getting the cached version in resetNotificationTimestamp]] (duration: 00m 51s)
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bullseye
* 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 08:55 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
* 08:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
* 08:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:37 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/ProofreadPage/includes/Page/PageContentHandler.php: Backport: [[gerrit:754598{{!}}Use fillParserOutputInternal instead of getParserOutput. (T292300)]] (duration: 00m 51s)
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bullseye
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:30 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc2 [[phab:T299046|T299046]] (duration: 00m 51s)
* 08:20 Amir1: cleaning up commons linter errors [[phab:T298782|T298782]]
* 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:12 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/includes/RecordLintJob.php: Backport: [[gerrit:754144{{!}}Drop 'inline-media-caption' lint requests (T297443 T299302)]] (duration: 00m 52s)
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bullseye
* 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 06:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 06:13 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 05:59 kart_: Update apertium to 2022-01-18-052631-production ([[phab:T218184|T218184]], [[phab:T202276|T202276]], [[phab:T218184|T218184]], [[phab:T270061|T270061]], [[phab:T248653|T248653]], [[phab:T248293|T248293]], [[phab:T248812|T248812]], [[phab:T248654|T248654]])
* 05:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync on production
* 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
* 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
* 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
* 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
* 05:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: sync on production
* 05:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply on staging
* 05:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply on production
* 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: sync on staging
* 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply on production
* 05:49 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply on staging
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18764 and previous config saved to /var/cache/conftool/dbconfig/20220118-054659-marostegui.json
* 02:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== 2015-07-15 ==
== 2022-01-17 ==
* 23:36 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s)
* 23:27 jynus: forced session revocation on phab for a user [[phab:T299315|T299315]]
* 23:22 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s)
* 20:48 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided) (duration: 00m 02s)
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s)
* 20:48 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided)
* 23:10 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s)
* 18:47 krinkle@deploy1002: Finished deploy [integration/docroot@1621c26]: (no justification provided) (duration: 01m 14s)
* 23:09 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s)
* 18:46 krinkle@deploy1002: Started deploy [integration/docroot@1621c26]: (no justification provided)
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s)
* 16:30 moritzm: installing python-virtualenv bugfix updates from bullseye 11.2 point release
* 22:23 csteipp: deploy patch for T105305 to wmf13/14
* 16:21 moritzm: installing wget bugfix updates from bullseye 11.2 point release
* 22:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s)
* 16:13 moritzm: installing freeipmi bugfix updates from bullseye 11.2 point release
* 21:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s)
* 16:02 moritzm: installing curl bugfix updates from bullseye 11.2 point release
* 21:54 manybubbles: es1.6 upgrade: upgrade elastic1022
* 15:54 mutante: mw1414,mw1415,mw1416,mw1417,mw1418,mw1447,mw1448,mw1449,mw1450,mw1437,mw1438 (all canaries eqiad) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 21:37 manybubbles: es1.6 upgrade: upgrade elastic1021
* 15:46 mutante: parse2001, parse2002, wtp1025, wtp1026 (all parsoid canaries - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 21:09 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s)
* 15:40 mutante: mw2278, mw2279, mw2374, mw2376 (API and jobrunner canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 20:41 bblack: restarted salt-master service on palladium
* 15:34 mutante: mw2271, mw2272, mw2251, mw2252 (appserver and API canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 20:33 bblack: globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm'
* 15:01 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1003.eqiad.wmnet
* 20:30 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s)
* 14:58 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1003.eqiad.wmnet
* 20:20 manybubbles: es1.6 upgrade: upgrade elastic1020
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2132.codfw.wmnet with OS bullseye
* 20:18 RoanKattouw: Running FlowCreateMentionTemplate.php on all Flow wikis
* 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1002.eqiad.wmnet
* 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14
* 14:48 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1002.eqiad.wmnet
* 19:50 ejegg: updated civicrm from e29cc5f20b5069afcaff794e628596c1f70d69a3 to 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7
* 14:45 moritzm: imported cassandra 3.11.11 to component/cassandradev for stretch-wikimedia and buster-wikimedia [[phab:T298805|T298805]]
* 19:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s)
* 14:41 moritzm: systemctl reset-failed ifup@ens5.service on an-airflow1001 [[phab:T273026|T273026]]
* 19:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s)
* 14:39 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1001.eqiad.wmnet
* 19:00 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s)
* 14:37 hnowlan: removing restbase2009 from cassandra configs
* 18:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 14:30 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1001.eqiad.wmnet
* 18:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 14:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bullseye
* 18:40 ejegg: updated civicrm from f4219bc8eca5e4db633da07b6ac9e2505cfbae16 to e29cc5f20b5069afcaff794e628596c1f70d69a3
* 14:15 marostegui: Reimage db2132 to Bullseye [[phab:T299344|T299344]]
* 18:39 logmsgbot: krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s)
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18762 and previous config saved to /var/cache/conftool/dbconfig/20220117-134520-marostegui.json
* 18:39 logmsgbot: twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s)
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1151.eqiad.wmnet with OS bullseye
* 18:21 manybubbles: es1.6 upgrade: upgrading elastic1019
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1151.eqiad.wmnet with OS bullseye
* 18:20 Jeff_Green: authdns-update shifting to service-oriented hostnames for fundraising cluster
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2142.codfw.wmnet with OS bullseye
* 18:06 logmsgbot: twentyafterfour Started scap: group0 to 1.26wmf14
* 11:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2142.codfw.wmnet with OS bullseye
* 17:55 ejegg: updated civicrm from 6560cefa8d7e68e35e30b310d6691ab57798a4c9 to f4219bc8eca5e4db633da07b6ac9e2505cfbae16
* 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon1002.eqiad.wmnet
* 17:34 Jeff_Green: authdns-update to remove boron.wm.o
* 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon1002.eqiad.wmnet
* 17:22 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s)
* 11:08 moritzm: switching kubetcd1006 to DRBD-backed storage (required for ganeti update)
* 17:17 Jeff_Green: authdns-update to remove aluminium, also lanthanum by preexisting commit
* 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
* 16:45 andrewbogott: rebooting labvirt1005
* 11:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
* 16:43 mutante: accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton
* 11:00 moritzm: systemctl reset-failed ifup@ens5.service on kubetcd1005 [[phab:T273026|T273026]]
* 16:39 mutante: krypton - signing puppet cert, initial run
* 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 16:26 andrewbogott: woo, first try!
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18761 and previous config saved to /var/cache/conftool/dbconfig/20220117-104801-marostegui.json
* 16:23 andrewbogott: trying to kill labvirt1005 via repeated instance suspend/resume
* 10:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 16:04 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 10:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bullseye
* 16:03 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18760 and previous config saved to /var/cache/conftool/dbconfig/20220117-104459-marostegui.json
* 16:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s)
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s)
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1153.eqiad.wmnet with OS bullseye
* 15:35 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 11s)
* 10:42 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:29 logmsgbot: krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 10:32 moritzm: switching kubetcd1005 to DRBD-backed storage (required for ganeti update)
* 15:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 10:31 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync on staging
* 15:20 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s)
* 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
* 14:33 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
* 14:22 legoktm: sync failed on mw1090.eqiad.wmnet, read only filesystem
* 10:30 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply on production
* 14:20 logmsgbot: legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s)
* 10:30 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply on staging
* 13:55 dcausse: es1.6 upgrade: upgrade elastic1018
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18759 and previous config saved to /var/cache/conftool/dbconfig/20220117-102954-marostegui.json
* 13:24 springle: entry below not mw1216 fault, but r/o filesystem error on mw1090
* 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bullseye
* 13:15 springle: sync-common on mw1216 after sync-file from tin failed non-zero exit status 12
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1153.eqiad.wmnet with OS bullseye
* 13:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s)
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18758 and previous config saved to /var/cache/conftool/dbconfig/20220117-101450-marostegui.json
* 11:43 dcausse: es1.6 upgrade: upgrade elastic1017
* 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2144.codfw.wmnet with OS bullseye
* 08:27 dcausse: es1.6 upgrade: upgrade elastic1016
* 10:04 moritzm: switching kubetcd1004 to DRBD-backed storage (required for ganeti update)
* 06:31 dcausse: es1.6 upgrade: upgrade elastic1015
* 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
* 05:40 dcausse: es1.6 upgrade: upgrade elastic1014
* 10:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
* 05:10 springle: db1030 busy removing table partitioning
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2143.codfw.wmnet with OS bullseye
* 04:28 manybubbles: es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed.
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18757 and previous config saved to /var/cache/conftool/dbconfig/20220117-095945-marostegui.json
* 04:12 manybubbles: es1.6 upgrade: upgrade elastic1013
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18756 and previous config saved to /var/cache/conftool/dbconfig/20220117-095837-marostegui.json
* 03:49 springle: upgrade db1030 trusty
* 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 03:29 manybubbles: es1.6 upgrade: upgrade elastic1012
* 09:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 03:14 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18755 and previous config saved to /var/cache/conftool/dbconfig/20220117-095830-marostegui.json
* 03:10 logmsgbot: reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s)
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18754 and previous config saved to /var/cache/conftool/dbconfig/20220117-094325-marostegui.json
* 03:03 manybubbles: es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2144.codfw.wmnet with OS bullseye
* 02:10 Reedy: Running LU manually to see what's wrong with it
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2143.codfw.wmnet with OS bullseye
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s)
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18753 and previous config saved to /var/cache/conftool/dbconfig/20220117-092820-marostegui.json
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1017.eqiad.wmnet with OS bullseye
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18752 and previous config saved to /var/cache/conftool/dbconfig/20220117-091316-marostegui.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18751 and previous config saved to /var/cache/conftool/dbconfig/20220117-091308-marostegui.json
* 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18750 and previous config saved to /var/cache/conftool/dbconfig/20220117-091300-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18749 and previous config saved to /var/cache/conftool/dbconfig/20220117-085756-marostegui.json
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1017.eqiad.wmnet with OS bullseye
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18748 and previous config saved to /var/cache/conftool/dbconfig/20220117-084251-marostegui.json
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1003.eqiad.wmnet
* 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1003.eqiad.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18747 and previous config saved to /var/cache/conftool/dbconfig/20220117-082746-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18746 and previous config saved to /var/cache/conftool/dbconfig/20220117-082638-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1004.eqiad.wmnet
* 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1004.eqiad.wmnet
* 06:59 elukey: `systemctl reset-failed ifup@ens5.service` on an-test-client1001 and kafka-test1010
* 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1016.eqiad.wmnet with OS bullseye
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1016.eqiad.wmnet with OS bullseye


== 2015-07-14 ==
== 2022-01-16 ==
* 23:46 manybubbles: es1.6 upgrade: upgraded elastic1011
* 08:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 23:22 bblack: updating nginx to 1.9.3-1+wmf1 on cp*
* 08:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 23:17 bblack: reprepro: nginx for jessie-wikimedia/main bumped to 1.9.3-1+wmf1
* 08:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
* 22:22 ejegg: updated civicrm from 04efc7d5c7bbb068f907125f2184692aee676123 to 6560cefa8d7e68e35e30b310d6691ab57798a4c9
* 08:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 21:29 Reedy: mw1090 fs is ro
* 08:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 21:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Fix testwiki
* 08:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
* 21:05 _joe|AFK: depooling mw1090, ext4 errors in syslog, filesystem mounted read-only
* 21:01 logmsgbot: twentyafterfour Synchronized wmf-config/CommonSettings.php: revert LCStoreStaticArray (duration: 00m 12s)
* 20:59 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf14 and rebuild localization cache (duration: 72m 45s)
* 20:42 bblack: undoing LCStoreStaticArray because appservers look unhealthy, using ori's command: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"'
* 19:46 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf14 and rebuild localization cache
* 19:23 manybubbles: es1.6 step iforget: upgrade elasticsearch on elastic1010
* 17:41 mutante: terbium:  /usr/local/bin/foreachwiki extensions/Echo/maintenance/processEchoEmailBatch.php
* 17:10 dcausse: es1.6 step 10: upgrade elastic1009
* 16:23 mutante: bromine - apt-get upgrade
* 15:08 logmsgbot: manybubbles Synchronized php-1.26wmf13/extensions/UniversalLanguageSelector/: SWAT add some hooks to extension.json (duration: 00m 13s)
* 14:34 gwicke: started RESTBase revision thin-out script for html and data-parsoid on wikimedia domains
* 14:01 dcausse: es1.6 step 9: upgrade elastic1008
* 12:48 _joe_: reimaging mw1155
* 12:17 ori: Logging a message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log.
* 11:28 dcausse: es1.6 step 8: upgrade elastic1007
* 11:25 _joe_: repooling mw1154 with HHVM
* 10:12 _joe_: stopped poolcounter on mw1154
* 10:06 _joe_: reimaging mw1154
* 07:49 dcausse: es1.6 step 7: upgrade elastic1006
* 07:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 07:09:10 UTC 2015 (duration 9m 9s)
* 06:48 dcausse: es1.6 step 6: upgrade elastic1005
* 06:41 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9c9bf0f4: Use LCStoreStaticArray unconditionally (duration: 03m 02s)
* 05:26 ori: Cleaned up now-unused hhbc files from /run/hhvm/cache on job runners
* 04:58 ori: Enabling LCStoreStaticArray in production. May be reverted by running: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"' on palladium.
* 04:48 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Follow-up for Ieb62ee050e: allow LCStoreStaticArray in server mode (duration: 00m 13s)
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s)
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00
* 01:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)


== 2015-07-13 ==
== 2022-01-15 ==
* 23:22 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/VisualEditor: SWAT (duration: 00m 11s)
* 08:55 legoktm: finished running recountCategories on s4 wikis ([[phab:T299244|T299244]])
* 23:11 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Add title to Parsoid exception logging (duration: 00m 12s)
* 07:58 legoktm: finished running recountCategories on s7 wikis ([[phab:T299244|T299244]])
* 22:45 logmsgbot: legoktm Synchronized wmf-config: Revert "Set $wgCentralAuthStrict = true;" (duration: 00m 13s)
* 07:51 legoktm: finished running recountCategories on s2 wikis ([[phab:T299244|T299244]])
* 22:41 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 13s)
* 06:41 <legoktm>: finished running recountCategories on s3 wikis ([[phab:T299244|T299244]])
* 22:41 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 06:21 <legoktm>: finished running recountCategories on s6 wikis ([[phab:T299244|T299244]])
* 22:16 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/User.php: Add 'AuthPluginStrict' log to identify users who are unable to authenticate (duration: 00m 13s)
* 06:19 <legoktm>: finished running recountCategories on s5 wikis ([[phab:T299244|T299244]])
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 12s)
* 06:18 <legoktm>: finished running recountCategories on s8 wikis ([[phab:T299244|T299244]])
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/Hooks.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 13s)
* 06:14 legoktm: running recountCategories on s3 wikis
* 22:13 ejegg: updated payments from ec34ebf61e5962f66b807abdcb519ff323d41e8e to 4ca95d55a9745c05ccfbb16ee6f23a6f75328824
* 05:20 legoktm: started recountCategories.php --wiki=enwiki --mode pages ([[phab:T299244|T299244]])
* 22:00 manybubbles: es1.6 step 4: upgrade elastic1003
* 03:05 legoktm: started refreshLinks --dfn-only via systemd units for s7-s8 ([[phab:T299244|T299244]])
* 21:54 ori: Debugging metric issue on graphite1001, brief stats drop possible
* 03:01 legoktm: started refreshLinks --dfn-only via systemd units for s2-s6 ([[phab:T299244|T299244]])
* 21:32 legoktm: renaming ~3k users who were originally missed for SULF
* 02:55 legoktm: started mwscript refreshLinks.php --wiki=commonswiki --dfn-only ([[phab:T299244|T299244]])
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/Hooks.php: (no message) (duration: 00m 12s)
* 02:54 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only ([[phab:T299244|T299244]])
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: (no message) (duration: 00m 13s)
* 02:52 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only
* 20:42 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: f9c89d2814: Revert "Revert Count API module instantiations and Hook runs" (duration: 00m 13s)
* 01:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:30 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ieb62ee05: Temporary hack to facilitate migration of l10n cache implementations (duration: 00m 11s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:42 hoo: Updated Wikidata's property suggester with data from today's json dump
* 01:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:24 manybubbles_: es1.6 step 3: upgrade elastic1002
* 01:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:08 legoktm: running populateContentModel.php --table=page on all small wikis
* 01:04 legoktm: starting recountCategories.php --mode pages --wiki enwiki on mwmaint1002
* 19:01 andrewbogott: two of two
* 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:01 mutante: morebots - are you 1.7.11 ?
* 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:01 andrewbogott: one of two
* 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:52 legoktm: running populateContentModel.php --table=page on testwiki
* 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:29 manybubbles_: es1.6 step 2: shut down extra instance of elasticsearch on elastic1021
* 00:58 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 17:39 andrewbogott: this is the second test log of three
* 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:39 andrewbogott: this is the first test log of three
* 00:52 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 00m 52s)
* 17:36 mutante: included adminbot_1.7.11 in APT repo
* 00:51 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 16:31 andrewbogott: wikidata-dev updated local puppet and rebooting property-suggester
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:08 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 00:46 jforrester@deploy1002: Finished scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]] (duration: 03m 58s)
* 16:07 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 00:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:11 manybubbles_: all done SWATing.
* 00:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:09 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable footer contact link on ukwiki (duration: 00m 11s)
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:55 manybubbles_: after upgrading elasticsearch its init script no longer shuts down the old version of elasticsearch. so you have to manually kill it. that means the upgrade instructions will be "special" this time around. hopefully this is a one time thing.
* 00:42 jforrester@deploy1002: Started scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]]
* 14:45 manybubbles_: es1.6 step 1: upgrade elasticsearch on elastic1001 -starting
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:45 manybubbles_: es1.6 step 0: successfully synced new versions of plugins
* 00:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:30 manybubbles_: es1.6 step 0: sync new versions of plugins
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:30 manybubbles_: starting the elasticsearch 1.6.0 upgrade
* 00:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:13 bblack: updating nginx/bind on cp*
* 00:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17"
* 13:07 bblack: updating openssl on cp*
* 13:02 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/Cite/extension.json: https://gerrit.wikimedia.org/r/#/c/224407/ - unbreak VE mobile, https://phabricator.wikimedia.org/T105686 (duration: 00m 12s)
* 10:58 mobrovac: restbase deploying 6dec79d
* 10:22 logmsgbot: ori Synchronized php-1.26wmf13/maintenance/rebuildLocalisationCache.php: 117f60a171: rebuildLocalisationCache: don't limit memory usage (duration: 00m 12s)
* 08:52 godog: bounce graphite-web on graphite1001
* 08:51 godog: bounce carbon daemons on graphite1001
* 08:50 godog: upgrade graphite to 0.9.13 on graphite1001 and bounce one instance of carbon/cache
* 07:29 logmsgbot: ori Synchronized php-1.26wmf13/includes/cache/LCStoreStaticArray.php: I3f63594a4: Fix variable name (follows Ib2c5856d) (duration: 00m 11s)
* 06:25 logmsgbot: LocalisationUpdate failed: git pull of core failed
* 06:24 ori: Experimenting with altering the localisation cache implementation for testwiki, operations/mediawiki-config on tin will have a local hack for a little bit
* 05:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 05:07:32 UTC 2015 (duration 7m 31s)
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 02:25:58 UTC 2015 (duration 25m 57s)
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:23:43+00:00
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 16s)
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:10:25+00:00
* 02:10 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 01:47 springle: restarted labsdb1002 mysqld while troubleshooting replication


== 2015-07-12 ==
== 2022-01-14 ==
* 14:59 bblack: upgraded most packages on sodium
* 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 22:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 18:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 18:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 17:44 bblack: drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 17:26 bblack: reboot lvs600[23]
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 16:55 bblack: reboot lvs6001
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 16:30 bblack: rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed)
* 16:15 dancy@deploy1002: Synchronized README: Testing php-fpm restart (duration: 03m 18s)
* 16:04 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:39 bblack: lvs6001 + all services downtimed
* 15:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: dc=drmrs
* 15:00 bblack: silenced site=drmrs in alertmanager for one month, I think
* 15:00 bblack: silenced site=drmrs in alertmanager, I think
* 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye
* 13:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye
* 12:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 12:51 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster
* 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster
* 12:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 12:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 11:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 11:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster
* 11:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 11:00 moritzm: systemctl reset-failed ifup@ens5.service on archiva1002 [[phab:T273026|T273026]]
* 10:56 moritzm: rebooting archiva1002 (running archiva.wikimedia.org)
* 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 10:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 10:50 moritzm: systemctl reset-failed ifup@ens5.service on an-test-ui1001 [[phab:T273026|T273026]]
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet
* 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 10:05 moritzm: rebooting matomo1002 (running piwik.wikimedia.org)
* 10:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet
* 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org
* 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org
* 09:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install1003.wikimedia.org
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-client1001.eqiad.wmnet
* 09:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-client1001.eqiad.wmnet
* 09:11 marostegui: Move pc1014 from pc1 to pc2 [[phab:T299046|T299046]]
* 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1009.eqiad.wmnet
* 09:01 moritzm: rebooting an-tool1009 (running hue.wikimedia.org)
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1009.eqiad.wmnet
* 09:00 moritzm: systemctl reset-failed ifup@ens5.service on an-tool1005 [[phab:T273026|T273026]]
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1008.eqiad.wmnet
* 08:58 moritzm: rebooting an-tool1008 (running yarn.wikimedia.org)
* 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1008.eqiad.wmnet
* 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1007.eqiad.wmnet
* 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1007.eqiad.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1005.eqiad.wmnet
* 08:51 moritzm: rebooting an-tool1007 (running turnilo.wikimedia.org)
* 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1005.eqiad.wmnet
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bullseye
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bullseye
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bullseye
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18735 and previous config saved to /var/cache/conftool/dbconfig/20220114-063554-marostegui.json
* 06:15 marostegui: Failover m5 proxy from dbproxy1017 to dbproxy1021 [[phab:T298586|T298586]]
* 05:16 legoktm: manually restarted discard_held_messages service on lists1001, failed with a spurious sqlalchemy issue about packets being out of order
* 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:15 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:09 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/includes/content/WikitextContentHandler.php: Backport: [[gerrit:753828{{!}}In WikitextContentHandler always use getFreshParser() (T299149)]] (duration: 01m 07s)


== 2015-07-11 ==
== 2022-01-13 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 22:40 WFan: Updating payment-wiki, revision changed from {{Gerrit|8497eae9}} to {{Gerrit|5cc9d5e0}}
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 22:18 dzahn@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=miscweb
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 22:00 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 21:48 mutante: running puppet on cp-ulsfo
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 04:12 bd808: rebooting logstash1006
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 04:06 bd808: logstash1005 fully recovered all shards
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:34 bd808: rebooting logstash1005
* 20:31 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.17"
* 00:30 bd808: logstash1004 fully recovered all shards
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:29 dduvall: rolling back wmf.17 from group1 due to a large increase in "Parser state cleared while parsing" across commons and group1 wikipedias ([[phab:T293958|T293958]], [[phab:T299149|T299149]])
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:17 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 20:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 19:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2051.codfw.wmnet with OS stretch
* 19:40 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 19:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753634{{!}}Enable ArticlePlaceholder on dagwiki (T298349)]] (duration: 01m 13s)
* 19:37 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 19:23 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747993{{!}}Add event stream config for ios.notification_interaction (T290920)]] (duration: 01m 13s)
* 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747991{{!}}Add event stream config for android.customize_toolbar_interaction (T297818)]] (duration: 01m 12s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:07 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753793{{!}}Enable skin migration mode on the beta cluster]] (duration: 01m 14s)
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:34 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 16:27 moritzm: impor maps-deduped-tilelist 0.0.5 to buster-wikimedia/main [[phab:T297408|T297408]]
* 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 16:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:50 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aphlict1001.eqiad.wmnet
* 15:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aphlict1001.eqiad.wmnet
* 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM flowspec1001.eqiad.wmnet
* 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM flowspec1001.eqiad.wmnet
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
* 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
* 15:23 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
* 15:21 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2009.codfw.wmnet with OS buster
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM seaborgium.wikimedia.org
* 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM seaborgium.wikimedia.org
* 15:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1002.wikimedia.org
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1002.wikimedia.org
* 14:56 mmandere: cp3053: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1001.wikimedia.org
* 14:47 moritzm: systemctl reset-failed ifup@ens5.service on idp1001 [[phab:T273026|T273026]]
* 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp1001.wikimedia.org
* 14:15 moritzm: switch ml-etcd1003 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 14:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 13:53 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet
* 13:49 moritzm: switch ml-etcd1002 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:45 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet
* 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1001.wikimedia.org
* 13:33 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1001.wikimedia.org
* 13:23 moritzm: switch ml-etcd1001 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1001-dev.eqiad.wmnet
* 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1001-dev.eqiad.wmnet
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18731 and previous config saved to /var/cache/conftool/dbconfig/20220113-124307-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18730 and previous config saved to /var/cache/conftool/dbconfig/20220113-124300-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s3 codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18729 and previous config saved to /var/cache/conftool/dbconfig/20220113-124140-marostegui.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P18728 and previous config saved to /var/cache/conftool/dbconfig/20220113-123744-marostegui.json
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1002-dev.eqiad.wmnet
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18727 and previous config saved to /var/cache/conftool/dbconfig/20220113-122803-root.json
* 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp1001.wikimedia.org
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp1001.wikimedia.org
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18726 and previous config saved to /var/cache/conftool/dbconfig/20220113-121300-root.json
* 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM eventlog1003.eqiad.wmnet
* 11:59 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM eventlog1003.eqiad.wmnet
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18725 and previous config saved to /var/cache/conftool/dbconfig/20220113-115756-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18724 and previous config saved to /var/cache/conftool/dbconfig/20220113-114252-root.json
* 11:34 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18723 and previous config saved to /var/cache/conftool/dbconfig/20220113-112749-root.json
* 11:26 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
* 11:26 _joe_: update scap everywhere [[phab:T298986|T298986]]
* 11:25 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 09s)
* 11:25 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
* 11:24 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: (no justification provided) (duration: 00m 09s)
* 11:23 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: (no justification provided)
* 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1001.eqiad.wmnet
* 11:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2022.codfw.wmnet with OS bullseye
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1001.eqiad.wmnet
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18722 and previous config saved to /var/cache/conftool/dbconfig/20220113-111245-root.json
* 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
* 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1001.wikimedia.org
* 11:08 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
* 11:03 moritzm: rebooting netbox1001 (running netbox.wikimedia.org)
* 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1001.wikimedia.org
* 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1001.eqiad.wmnet with OS buster
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb1001.eqiad.wmnet
* 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb1001.eqiad.wmnet
* 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18721 and previous config saved to /var/cache/conftool/dbconfig/20220113-105741-root.json
* 10:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
* 10:52 hashar: Restarting Jenkins CI for plugins update [[phab:T298691|T298691]]
* 10:47 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader1001.eqiad.wmnet
* 10:45 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader1001.eqiad.wmnet
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2022.codfw.wmnet with OS bullseye
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18720 and previous config saved to /var/cache/conftool/dbconfig/20220113-104238-root.json
* 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1001.wikimedia.org
* 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1001.eqiad.wmnet with OS buster
* 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc1001.wikimedia.org
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18719 and previous config saved to /var/cache/conftool/dbconfig/20220113-102734-root.json
* 10:27 moritzm: systemctl reset-failed ifup@ens5.service on lists1001 [[phab:T273026|T273026]]
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana1002.eqiad.wmnet
* 10:10 moritzm: rebooting grafana1002 (running grafana.wikimedia.org)
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana1002.eqiad.wmnet
* 10:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 10:02 mmandere: cp3052: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 10:02 joal@deploy1002: Finished deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] (duration: 21m 47s)
* 10:02 elukey: run kafka preferred-replica-election on kafka-main1001 to force a rebalance of partition leaders (after kafka-main1002's reimage)
* 10:00 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
* 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1002.eqiad.wmnet with OS buster
* 09:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
* 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386]
* 09:40 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] (duration: 00m 07s)
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386]
* 09:39 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] (duration: 06m 59s)
* 09:35 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386]
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1002.eqiad.wmnet with OS buster
* 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui1001.eqiad.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui1001.eqiad.wmnet
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM lists1001.wikimedia.org
* 09:02 moritzm: rebooting lists1001 (running lists.wikimedia.org) to pick up new KVM setting
* 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022, give weight to es1021 [[phab:T295965|T295965]] ', diff saved to https://phabricator.wikimedia.org/P18718 and previous config saved to /var/cache/conftool/dbconfig/20220113-085906-marostegui.json
* 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1003.eqiad.wmnet with OS buster
* 08:39 elukey: ipmi mc reset cold for kafka-main1002, mgmt interface not reachable via ssh
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18717 and previous config saved to /var/cache/conftool/dbconfig/20220113-083923-marostegui.json
* 08:28 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753505{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753504{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1003.eqiad.wmnet with OS buster
* 08:06 marostegui: Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) [[phab:T287244|T287244]]
* 08:02 elukey: ipmi mc reset cold for kafka-main1003, mgmt interface not reachable via ssh
* 07:57 elukey: stop kafka* on kafka-main1003 as prep-step for reimage to buster
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18715 and previous config saved to /var/cache/conftool/dbconfig/20220113-075012-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1015.eqiad.wmnet with OS bullseye
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1015.eqiad.wmnet with OS bullseye
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/includes/export/WikiExporter.php: Backport: [[gerrit:753501{{!}}export: Remove ignoring rev_page_id index (T163532)]] (duration: 01m 28s)
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18714 and previous config saved to /var/cache/conftool/dbconfig/20220113-064113-root.json
* 06:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:38 marostegui: Failover m3 proxy from dbproxy1016 to dbproxy1020 [[phab:T298586|T298586]]
* 06:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:26 marostegui: Remove rev_page_id from frwiki,jawiki,ruwiki and labswiki from db1096 (s6) [[phab:T285149|T285149]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18713 and previous config saved to /var/cache/conftool/dbconfig/20220113-062609-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18712 and previous config saved to /var/cache/conftool/dbconfig/20220113-061105-root.json
* 06:05 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 27s)
* 05:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18711 and previous config saved to /var/cache/conftool/dbconfig/20220113-055602-root.json
* 05:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:53 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/tests/phpunit/unit/includes/libs/rdbms/database/DatabaseSQLTest.php: (no justification provided) (duration: 01m 32s)
* 05:00 TimStarling: doing [[phab:T299095|T299095]] restorations on s3 wikis
* 04:30 TimStarling: on mwmaint1002: inserting 11565 rows into itwiki.pagelinks for [[phab:T299095|T299095]]
* 03:33 TimStarling: on mwmaint1002: inserting {{Gerrit|1714288}} into wikidatawiki.pagelinks for [[phab:T299095|T299095]]
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:30 TimStarling: on mwmaint1002: inserting {{Gerrit|4221344}} rows into commonswiki.pagelinks to clean up from [[phab:T299095|T299095]]
* 02:29 tstarling@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/sql.php: batch size (duration: 01m 28s)
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752751{{!}}Enable CirrusSearch on it/en Wikivoyage]] (duration: 01m 28s)
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752760{{!}}Skip vector-2022 skin in config, not Vector skin (T298923)]] (duration: 01m 29s)
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753584{{!}}Enable Disambiguator notifications on all wikis (T293319)]] (duration: 01m 28s)
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn


== 2015-07-10 ==
== 2022-01-12 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 23:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:43 bd808: rebooting logstash1004
* 23:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 23:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.17
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 23:07 jhathaway: rebooting mx1001 to get old kernel
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 22:48 cwhite: end eqiad opensearch upgrade [[phab:T288621|T288621]]
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18709 and previous config saved to /var/cache/conftool/dbconfig/20220112-214258-marostegui.json
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 21:28 mbsantos: mbsantos@maps1009.eqiad.wmnet: start imposm-initial-import  - full planet re-import ([[phab:T299049|T299049]])
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18708 and previous config saved to /var/cache/conftool/dbconfig/20220112-212753-marostegui.json
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 21:19 ryankemper: [WDQS] [[phab:T299098|T299098]] depooled `wdqs2003` so dc-ops can take a look at the PS2 failure
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 21:18 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2] (duration: 06m 57s)
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18707 and previous config saved to /var/cache/conftool/dbconfig/20220112-211248-marostegui.json
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2]
* 17:26 moritzm: installed python security updates on mc*
* 21:11 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2] (duration: 00m 07s)
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2]
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 21:10 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2] (duration: 24m 20s)
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18706 and previous config saved to /var/cache/conftool/dbconfig/20220112-205744-marostegui.json
* 15:25 godog: bounce cassandra on restbae1004
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18705 and previous config saved to /var/cache/conftool/dbconfig/20220112-205636-marostegui.json
* 13:43 godog: bounce cassandra on restbae1004
* 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:37 _joe_: temporarily repooled mw1031
* 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 12:40 godog: bounce cassandra on restbae1004
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18704 and previous config saved to /var/cache/conftool/dbconfig/20220112-205629-marostegui.json
* 07:43 godog: reimage ms-be2013 T105213
* 20:46 joal@deploy1002: Started deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2]
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18703 and previous config saved to /var/cache/conftool/dbconfig/20220112-204124-marostegui.json
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 20:36 dduvall: 1.38.0-wmf.17 rolled back from group1 due to large spike in db read-only errors and slow queries ([[phab:T293958|T293958]])
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 20:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.38.0-wmf.17
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:59 elee: please log this with the year
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:53 andrewbogott: testing the log by logging a test
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:50 gwicke: bounced cassandra on restbase1004
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:38 jgage: cassandra restarted on restbase1004
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18702 and previous config saved to /var/cache/conftool/dbconfig/20220112-202619-marostegui.json
* 00:39 urandom: starting restbase1004
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 20:21 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 21s)
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:19 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:19 jgleeson: updated payments from {{Gerrit|939cb4bc}} to {{Gerrit|8497eae9}}
* 20:17 mutante: applying firewall change on phabricator (VCS, git-ssh), second attempt, first codfw-only
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18701 and previous config saved to /var/cache/conftool/dbconfig/20220112-201114-marostegui.json
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18700 and previous config saved to /var/cache/conftool/dbconfig/20220112-200806-marostegui.json
* 20:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18699 and previous config saved to /var/cache/conftool/dbconfig/20220112-200759-marostegui.json
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18698 and previous config saved to /var/cache/conftool/dbconfig/20220112-195254-marostegui.json
* 19:52 hashar: Restarting CI Jenkins once more to apply the Gearman plugin update [[phab:T298691|T298691]]
* 19:44 hashar: Clearing /srv partition on integration-castor03
* 19:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18697 and previous config saved to /var/cache/conftool/dbconfig/20220112-193749-marostegui.json
* 19:34 hashar: Upgrading CI Jenkins and Gearman plugin [[phab:T298691|T298691]]
* 19:29 mutante: wdqs2003 - one power supply failed so it's not redundant anymore, says Icinga
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 cwhite: begin eqiad opensearch upgrade [[phab:T288621|T288621]]
* 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18696 and previous config saved to /var/cache/conftool/dbconfig/20220112-192244-marostegui.json
* 19:22 mutante: deneb - for some reason the "package builder clean up build directory"-service fails [[phab:T287222|T287222]]
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 cjming: end of UTC evening backport & config window
* 19:21 mutante: [deneb:~] $ sudo systemctl start  package_builder_Clean_up_build_directory.service
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753187{{!}}Add new vector skin key to RelatedArticlesFooterAllowedSkins. (T298916)]] (duration: 01m 21s)
* 19:18 mutante: pybal-test2002 - apt-get clean after icinga alert about disk space running out
* 19:17 mutante: zookeeper-test1002 - CRITICAL - degraded: The following units failed: ifup@ens5.service - for this issue see [[phab:T273026|T273026]] ([[phab:T268074|T268074]])
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:14 mutante: elastic10180 - one power supply seeming failed - see icinga IPMI alert - [Status = Critical, PS Redundancy = Critical] [[phab:T294805|T294805]]
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18695 and previous config saved to /var/cache/conftool/dbconfig/20220112-191436-marostegui.json
* 19:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18694 and previous config saved to /var/cache/conftool/dbconfig/20220112-191428-marostegui.json
* 19:13 cjming@deploy1002: Synchronized php-1.38.0-wmf.17/includes/export/WikiExporter.php: Backport: [[gerrit:753085{{!}}Partial revert of I1a691f01cd82e60bf41207d32501edb4b9835e37 to unbreak dumps (T299020)]] (duration: 01m 22s)
* 19:12 mutante: mirror1001 - CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service - [[phab:T286898|T286898]]
* 19:09 hashar: Upgraded releases Jenkins from 2.319.1 to 2.319.2 # [[phab:T298691|T298691]]
* 19:06 moritzm: imported jenkins 2.319.2 to thirdparty/ci fpr buster-wikimedia
* 19:05 mutante: [mwmaint1002:~] $ sudo systemctl status mediawiki_job_updatequerypages_mostlinked_s3@13.service (running fine but had failed for unknown reason last time it was supposed to run automatically)
* 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18693 and previous config saved to /var/cache/conftool/dbconfig/20220112-185923-marostegui.json
* 18:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18692 and previous config saved to /var/cache/conftool/dbconfig/20220112-184418-marostegui.json
* 18:40 mutante: phab1001 - temp disabling puppet - deployed firewall change on phab2001 - debugging - no impact
* 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18691 and previous config saved to /var/cache/conftool/dbconfig/20220112-182913-marostegui.json
* 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18690 and previous config saved to /var/cache/conftool/dbconfig/20220112-182806-marostegui.json
* 18:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18689 and previous config saved to /var/cache/conftool/dbconfig/20220112-182725-marostegui.json
* 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18688 and previous config saved to /var/cache/conftool/dbconfig/20220112-181220-marostegui.json
* 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18687 and previous config saved to /var/cache/conftool/dbconfig/20220112-175715-marostegui.json
* 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18686 and previous config saved to /var/cache/conftool/dbconfig/20220112-174211-marostegui.json
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18685 and previous config saved to /var/cache/conftool/dbconfig/20220112-174103-marostegui.json
* 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18684 and previous config saved to /var/cache/conftool/dbconfig/20220112-174056-marostegui.json
* 17:38 _joe_: deploying scap 4.1.1 to the restbase canaries [[phab:T298986|T298986]]
* 17:34 _joe_: deploying scap 4.1.1 to the mediawiki canaries [[phab:T298986|T298986]]
* 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bullseye
* 17:27 dancy@deploy1002: Started scap: testing
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18683 and previous config saved to /var/cache/conftool/dbconfig/20220112-172551-marostegui.json
* 17:25 dancy@deploy1002: Started scap: testing
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18682 and previous config saved to /var/cache/conftool/dbconfig/20220112-171047-marostegui.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 17:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 16:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1005.eqiad.wmnet
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18681 and previous config saved to /var/cache/conftool/dbconfig/20220112-165542-marostegui.json
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18680 and previous config saved to /var/cache/conftool/dbconfig/20220112-165434-marostegui.json
* 16:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1005.eqiad.wmnet
* 16:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:53 hnowlan: Decommissioning cassandra instance restbase2009-c via nodetool
* 16:48 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 16:45 elukey: elukey@prometheus2004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:44 elukey: elukey@prometheus2003:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:40 elukey: elukey@prometheus1004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:39 elukey: elukey@prometheus1003:~$ sudo apt-get remove linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64 linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18678 and previous config saved to /var/cache/conftool/dbconfig/20220112-163919-marostegui.json
* 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx1001.wikimedia.org
* 16:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1004.eqiad.wmnet
* 16:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx1001.wikimedia.org
* 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:31 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1004.eqiad.wmnet
* 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:25 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
* 16:25 elukey: stop kafka* on kafka-main1003 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18677 and previous config saved to /var/cache/conftool/dbconfig/20220112-162414-marostegui.json
* 16:20 moritzm: switch kubestagetcd1006 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18676 and previous config saved to /var/cache/conftool/dbconfig/20220112-160910-marostegui.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18675 and previous config saved to /var/cache/conftool/dbconfig/20220112-160802-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18674 and previous config saved to /var/cache/conftool/dbconfig/20220112-160755-marostegui.json
* 16:02 elukey: stop kafka* on kafka-main1002 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:57 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 15:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 15:56 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18673 and previous config saved to /var/cache/conftool/dbconfig/20220112-155250-marostegui.json
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18672 and previous config saved to /var/cache/conftool/dbconfig/20220112-153745-marostegui.json
* 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18671 and previous config saved to /var/cache/conftool/dbconfig/20220112-152240-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18670 and previous config saved to /var/cache/conftool/dbconfig/20220112-152133-marostegui.json
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18669 and previous config saved to /var/cache/conftool/dbconfig/20220112-152121-marostegui.json
* 15:14 elukey: stop kafka* on kafka-main1001 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18668 and previous config saved to /var/cache/conftool/dbconfig/20220112-150616-marostegui.json
* 14:59 moritzm: switch kubestagetcd1005 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18667 and previous config saved to /var/cache/conftool/dbconfig/20220112-145111-marostegui.json
* 14:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:40 jelto: remove helm2 from deployment_server [[phab:T251305|T251305]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/753026
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
* 14:37 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
* 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18666 and previous config saved to /var/cache/conftool/dbconfig/20220112-143606-marostegui.json
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18665 and previous config saved to /var/cache/conftool/dbconfig/20220112-143258-marostegui.json
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18664 and previous config saved to /var/cache/conftool/dbconfig/20220112-143241-marostegui.json
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:23 moritzm: switch kubestagetcd1004 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18663 and previous config saved to /var/cache/conftool/dbconfig/20220112-141736-marostegui.json
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part III (duration: 01m 07s)
* 14:15 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part II (duration: 01m 08s)
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1002.eqiad.wmnet
* 14:14 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part I (duration: 01m 07s)
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1002.eqiad.wmnet
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1001.eqiad.wmnet
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18662 and previous config saved to /var/cache/conftool/dbconfig/20220112-140232-marostegui.json
* 14:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1001.eqiad.wmnet
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18661 and previous config saved to /var/cache/conftool/dbconfig/20220112-135858-marostegui.json
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18659 and previous config saved to /var/cache/conftool/dbconfig/20220112-134727-marostegui.json
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18658 and previous config saved to /var/cache/conftool/dbconfig/20220112-134620-marostegui.json
* 13:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18657 and previous config saved to /var/cache/conftool/dbconfig/20220112-134103-root.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753441{{!}}Disable flaggedrevs stable template inclusion in ruwikisource (T226054)]] (duration: 01m 08s)
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18656 and previous config saved to /var/cache/conftool/dbconfig/20220112-132600-root.json
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
* 13:20 urbanecm@deploy1002: Finished scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set (duration: 19m 19s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard1002.eqiad.wmnet
* 13:18 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
* 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard1002.eqiad.wmnet
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18655 and previous config saved to /var/cache/conftool/dbconfig/20220112-131056-root.json
* 13:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor1002.eqiad.wmnet
* 13:01 urbanecm@deploy1002: Started scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18654 and previous config saved to /var/cache/conftool/dbconfig/20220112-130050-marostegui.json
* 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor1002.eqiad.wmnet
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18653 and previous config saved to /var/cache/conftool/dbconfig/20220112-125552-root.json
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid1002.eqiad.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18652 and previous config saved to /var/cache/conftool/dbconfig/20220112-125402-marostegui.json
* 12:52 awight: EU deployment reopened :-)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P18651 and previous config saved to /var/cache/conftool/dbconfig/20220112-125208-marostegui.json
* 12:51 awight: EU deployment complete
* 12:50 awight@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/TemplateData: Backport: [[gerrit:752775{{!}}Allow aliases to be integers in addition to strings (T298795)]] (duration: 01m 07s)
* 12:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid1002.eqiad.wmnet
* 12:48 Amir1: removing orphan lint error reports in all wikis ([[phab:T298782|T298782]])
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18650 and previous config saved to /var/cache/conftool/dbconfig/20220112-124514-marostegui.json
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18649 and previous config saved to /var/cache/conftool/dbconfig/20220112-123010-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18648 and previous config saved to /var/cache/conftool/dbconfig/20220112-122742-marostegui.json
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18647 and previous config saved to /var/cache/conftool/dbconfig/20220112-121505-marostegui.json
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cfe389afce8037121f8e8b672f4fdf2458a068dd}}: fawiki: Add extendedmover usergroup ([[phab:T299038|T299038]]) (duration: 01m 08s)
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1002.eqiad.wmnet
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18646 and previous config saved to /var/cache/conftool/dbconfig/20220112-120931-marostegui.json
* 12:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1002.eqiad.wmnet
* 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1001.eqiad.wmnet
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1001.eqiad.wmnet
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases1002.eqiad.wmnet
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18645 and previous config saved to /var/cache/conftool/dbconfig/20220112-120000-marostegui.json
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases1002.eqiad.wmnet
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18644 and previous config saved to /var/cache/conftool/dbconfig/20220112-115259-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18643 and previous config saved to /var/cache/conftool/dbconfig/20220112-115031-marostegui.json
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18642 and previous config saved to /var/cache/conftool/dbconfig/20220112-115024-marostegui.json
* 11:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 11:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18641 and previous config saved to /var/cache/conftool/dbconfig/20220112-113518-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18640 and previous config saved to /var/cache/conftool/dbconfig/20220112-113119-marostegui.json
* 11:21 elukey: move kafka-jumbo nodes to fixed kafka uid/gid - [[phab:T296990|T296990]]
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18639 and previous config saved to /var/cache/conftool/dbconfig/20220112-112013-marostegui.json
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18638 and previous config saved to /var/cache/conftool/dbconfig/20220112-110508-marostegui.json
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dborch1001.wikimedia.org
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dborch1001.wikimedia.org
* 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:59 moritzm: rebalance ganeti/codfw row B (all nodes reimaged to Buster)
* 10:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18637 and previous config saved to /var/cache/conftool/dbconfig/20220112-105650-marostegui.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18636 and previous config saved to /var/cache/conftool/dbconfig/20220112-105540-marostegui.json
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18635 and previous config saved to /var/cache/conftool/dbconfig/20220112-105532-marostegui.json
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dbmonitor1002.wikimedia.org
* 10:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply on main
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dbmonitor1002.wikimedia.org
* 10:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:47 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply on main
* 10:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:41 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18634 and previous config saved to /var/cache/conftool/dbconfig/20220112-104028-marostegui.json
* 10:39 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply on main
* 10:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18633 and previous config saved to /var/cache/conftool/dbconfig/20220112-103619-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 10:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P18632 and previous config saved to /var/cache/conftool/dbconfig/20220112-103144-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18631 and previous config saved to /var/cache/conftool/dbconfig/20220112-102938-marostegui.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18630 and previous config saved to /var/cache/conftool/dbconfig/20220112-102523-marostegui.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18629 and previous config saved to /var/cache/conftool/dbconfig/20220112-101018-marostegui.json
* 10:08 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
* 10:06 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:57 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc1 (duration: 01m 07s)
* 09:54 hnowlan: Decommissioning cassandra instance restbase2009-b via nodetool
* 09:53 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
* 09:51 moritzm: reverting kubetcd2006 back to "plain" storage
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
* 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bullseye
* 09:21 moritzm: reverting kubetcd2005 back to "plain" storage
* 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bullseye
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18628 and previous config saved to /var/cache/conftool/dbconfig/20220112-090959-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc1 (duration: 01m 08s)
* 09:05 marostegui: Reset replication on pc1014
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18627 and previous config saved to /var/cache/conftool/dbconfig/20220112-085024-marostegui.json
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb1002.eqiad.wmnet
* 08:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb1002.eqiad.wmnet
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18626 and previous config saved to /var/cache/conftool/dbconfig/20220112-083520-marostegui.json
* 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1001.eqiad.wmnet
* 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1001.eqiad.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18625 and previous config saved to /var/cache/conftool/dbconfig/20220112-082015-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18624 and previous config saved to /var/cache/conftool/dbconfig/20220112-080510-marostegui.json
* 08:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 07:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 07:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: sync on main
* 07:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 07:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 07:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 07:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 07:41 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 07:41 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 07:40 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 07:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 07:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 07:29 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:28 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18623 and previous config saved to /var/cache/conftool/dbconfig/20220112-072826-marostegui.json
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18622 and previous config saved to /var/cache/conftool/dbconfig/20220112-071003-marostegui.json
* 07:02 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 06:58 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 06:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 06:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 06:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18621 and previous config saved to /var/cache/conftool/dbconfig/20220112-065458-marostegui.json
* 06:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync on main
* 06:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 06:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 06:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 06:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 06:48 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18620 and previous config saved to /var/cache/conftool/dbconfig/20220112-063953-marostegui.json
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:36 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18619 and previous config saved to /var/cache/conftool/dbconfig/20220112-062449-marostegui.json
* 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18618 and previous config saved to /var/cache/conftool/dbconfig/20220112-060923-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18617 and previous config saved to /var/cache/conftool/dbconfig/20220112-060803-marostegui.json
* 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:09 urbanecm: UTC late evening B&C done
* 00:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24a26392a3e36aa3a46445eb1f87e808b57b19c8}}: Enable Disambiguator notifications for French Wikipedia ([[phab:T293319|T293319]]) (duration: 01m 08s)
* 00:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:03 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== July 9 ==
== 2022-01-11 ==
* 23:41 legoktm: deployed patch for T105413
* 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:07 gwicke: bounced cassandra on restbase1004
* 23:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 23:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 23:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:04 gwicke: bounced cassandra on restbase1004
* 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:54 gwicke: bounced restbase on restbase1005
* 23:05 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Backport: [[gerrit:753071{{!}}Watchlist API update: Call correct method (T298999)]] (duration: 02m 40s)
* 17:32 ori: installed poolcounter on mw1154
* 23:04 dduvall: syncing backport to fix VE regression that followed testwiki/group0 deployment (cc [[phab:T293958|T293958]])
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 21:29 mutante: mw1418 - apt-get remove --purge fonts*; apt-get remove --purge xfonts*; running puppet - nothing gets reinstalled and with --purge it means 'dpkg -l {{!}} grep fonts' is actually empty, not full of "rc" still - [[phab:T294378|T294378]]
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 21:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18615 and previous config saved to /var/cache/conftool/dbconfig/20220111-211134-marostegui.json
* 16:53 gwicke: bounced cassandra on restbase1004
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18614 and previous config saved to /var/cache/conftool/dbconfig/20220111-205629-marostegui.json
* 16:48 godog: reboot ms-be2013 T105213
* 20:56 mutante: mw1418 (lowest numbered canary appserver that we use for httpbb hourly tests on cumin1001) - apt-get autoremove - removed font* and python3* packages - reason: [[phab:T294378|T294378]]
* 16:38 gwicke: bounced cassandra on restbase1006
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:07 _joe_: repooling mw1152
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:57 godog: restart cassandra on restbase1002
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:34 gwicke: bounced cassandra on restbase1004
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 20:42 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1009.eqiad.wmnet
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18613 and previous config saved to /var/cache/conftool/dbconfig/20220111-204124-marostegui.json
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 20:38 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1009.eqiad.wmnet
* 15:09 gwicke: bounced cassandra on restbase1004
* 20:38 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 20:36 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1008.eqiad.wmnet
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 20:32 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1008.eqiad.wmnet
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1007.eqiad.wmnet
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1032.eqiad.wmnet
* 14:24 godog: really upgrade python-django on graphite2001
* 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1007.eqiad.wmnet
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1032.eqiad.wmnet
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1031.eqiad.wmnet
* 11:34 godog: restart cassandra on restbase1001
* 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1030.eqiad.wmnet
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18612 and previous config saved to /var/cache/conftool/dbconfig/20220111-202620-marostegui.json
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18611 and previous config saved to /var/cache/conftool/dbconfig/20220111-202513-marostegui.json
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 20:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 20:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18610 and previous config saved to /var/cache/conftool/dbconfig/20220111-202505-marostegui.json
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1031.eqiad.wmnet
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1030.eqiad.wmnet
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1024.eqiad.wmnet
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1025.eqiad.wmnet
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18609 and previous config saved to /var/cache/conftool/dbconfig/20220111-201000-marostegui.json
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 20:09 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1025.eqiad.wmnet
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 20:08 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1024.eqiad.wmnet
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 20:01 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 39m 38s)
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 19:59 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1023.eqiad.wmnet
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18608 and previous config saved to /var/cache/conftool/dbconfig/20220111-195456-marostegui.json
* 02:28 twentyafterfour: restarted phd
* 19:53 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1023.eqiad.wmnet
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18607 and previous config saved to /var/cache/conftool/dbconfig/20220111-193951-marostegui.json
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18606 and previous config saved to /var/cache/conftool/dbconfig/20220111-193844-marostegui.json
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 19:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 19:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18605 and previous config saved to /var/cache/conftool/dbconfig/20220111-193836-marostegui.json
* 02:00 springle: pkg upgrade and restart db1037
* 19:30 sukhe: upload pdns-recursor_4.6.0-1wm1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18604 and previous config saved to /var/cache/conftool/dbconfig/20220111-192331-marostegui.json
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:21 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 19:17 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1002.eqiad.wmnet
* 19:13 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1002.eqiad.wmnet
* 19:13 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1001.eqiad.wmnet
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18603 and previous config saved to /var/cache/conftool/dbconfig/20220111-190827-marostegui.json
* 19:05 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1001.eqiad.wmnet
* 19:05 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1002.wikimedia.org
* 19:04 dduvall@deploy1002: Pruned MediaWiki: 1.38.0-wmf.9 (duration: 15m 51s)
* 19:01 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1002.wikimedia.org
* 19:00 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1001.wikimedia.org
* 18:58 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1001.wikimedia.org
* 18:57 ebernhardson: clear wcqs.jnl and aliases.map for all wcqs instances [[phab:T296470|T296470]]
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18602 and previous config saved to /var/cache/conftool/dbconfig/20220111-185322-marostegui.json
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18601 and previous config saved to /var/cache/conftool/dbconfig/20220111-185215-marostegui.json
* 18:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18600 and previous config saved to /var/cache/conftool/dbconfig/20220111-185208-marostegui.json
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:41 _joe_: also ran apt-get autoremove on mwdebug1002
* 18:41 _joe_: installed scap 4.1.1 on mwdebug1002 [[phab:T298986|T298986]], ran scap pull successfully
* 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18599 and previous config saved to /var/cache/conftool/dbconfig/20220111-183703-marostegui.json
* 18:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS buster
* 18:29 _joe_: uploaded scap 4.1.1-1 to apt [[phab:T298986|T298986]]
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18598 and previous config saved to /var/cache/conftool/dbconfig/20220111-182158-marostegui.json
* 18:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
* 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18597 and previous config saved to /var/cache/conftool/dbconfig/20220111-180653-marostegui.json
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18596 and previous config saved to /var/cache/conftool/dbconfig/20220111-180547-marostegui.json
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18595 and previous config saved to /var/cache/conftool/dbconfig/20220111-180534-marostegui.json
* 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18594 and previous config saved to /var/cache/conftool/dbconfig/20220111-175029-marostegui.json
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
* 17:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18593 and previous config saved to /var/cache/conftool/dbconfig/20220111-173524-marostegui.json
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18592 and previous config saved to /var/cache/conftool/dbconfig/20220111-172019-marostegui.json
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18591 and previous config saved to /var/cache/conftool/dbconfig/20220111-171912-marostegui.json
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18590 and previous config saved to /var/cache/conftool/dbconfig/20220111-171905-marostegui.json
* 17:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 02m 04s)
* 17:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1002.eqiad.wmnet
* 17:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
* 17:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 03m 33s)
* 17:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1002.eqiad.wmnet
* 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1001.eqiad.wmnet
* 17:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
* 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:04 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18589 and previous config saved to /var/cache/conftool/dbconfig/20220111-170400-marostegui.json
* 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:03 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1001.eqiad.wmnet
* 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:00 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18588 and previous config saved to /var/cache/conftool/dbconfig/20220111-164856-marostegui.json
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18587 and previous config saved to /var/cache/conftool/dbconfig/20220111-163351-marostegui.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18586 and previous config saved to /var/cache/conftool/dbconfig/20220111-163244-marostegui.json
* 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18585 and previous config saved to /var/cache/conftool/dbconfig/20220111-163237-marostegui.json
* 16:29 arturo: aborrero@apt1001:~ $ sudo -i reprepro clearvanished
* 16:23 arturo: aborrero@apt1001:~ $ sudo -i reprepro --noskipold --component thirdparty/kubeadm-k8s-1-21 update buster-wikimedia
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18584 and previous config saved to /var/cache/conftool/dbconfig/20220111-161732-marostegui.json
* 16:03 cwhite: begin rolling restart of opensearch in codfw - jvm upgrade
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18583 and previous config saved to /var/cache/conftool/dbconfig/20220111-160227-marostegui.json
* 15:59 vgutierrez: re-enable puppet on acme-chief clients after acmechief1001 reboot - [[phab:T294120|T294120]]
* 15:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief1001.eqiad.wmnet
* 15:56 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief1001.eqiad.wmnet
* 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
* 15:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
* 15:55 vgutierrez: disable puppet on acme-chief clients for acmechief1001 reboot - [[phab:T294120|T294120]]
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test1001.eqiad.wmnet
* 15:51 ebernhardson: restart elasticserach_6@production-search-psi-eqiad on elastic1049 to resolve issue with full heap
* 15:47 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test1001.eqiad.wmnet
* 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18582 and previous config saved to /var/cache/conftool/dbconfig/20220111-154722-marostegui.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18580 and previous config saved to /var/cache/conftool/dbconfig/20220111-154615-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18579 and previous config saved to /var/cache/conftool/dbconfig/20220111-154608-marostegui.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18578 and previous config saved to /var/cache/conftool/dbconfig/20220111-153103-marostegui.json
* 15:30 hnowlan: Decommissioning cassandra instance restbase2009-a via nodetool
* 15:22 arnoldokoth: systemctl reset-failed ifup@ens5.service on otrs1001 [[phab:T273026|T273026]]
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18577 and previous config saved to /var/cache/conftool/dbconfig/20220111-151558-marostegui.json
* 15:10 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM otrs1001.eqiad.wmnet
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
* 15:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
* 15:02 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM otrs1001.eqiad.wmnet
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18576 and previous config saved to /var/cache/conftool/dbconfig/20220111-150054-marostegui.json
* 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM etherpad1002.eqiad.wmnet
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18575 and previous config saved to /var/cache/conftool/dbconfig/20220111-145947-marostegui.json
* 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18574 and previous config saved to /var/cache/conftool/dbconfig/20220111-145939-marostegui.json
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM zookeeper-test1002.eqiad.wmnet
* 14:56 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM etherpad1002.eqiad.wmnet
* 14:48 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM zookeeper-test1002.eqiad.wmnet
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping1002.eqiad.wmnet
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping1002.eqiad.wmnet
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18573 and previous config saved to /var/cache/conftool/dbconfig/20220111-144435-marostegui.json
* 14:38 XioNoX: disable ping-offload in eqiad
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:35 marostegui: Upgrade pc1014 mysql
* 14:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751949{{!}}Clean up nova-network remains]] (2/2) (duration: 02m 40s)
* 14:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:751949{{!}}Clean up nova-network remains]] (1/2) (duration: 02m 49s)
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18572 and previous config saved to /var/cache/conftool/dbconfig/20220111-142930-marostegui.json
* 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:25 taavi@deploy1002: Synchronized wmf-config/reverse-proxy.php: Config: [[gerrit:751952{{!}}reverse-proxy: add drmrs ranges (T282787)]] (duration: 01m 36s)
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bullseye
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18571 and previous config saved to /var/cache/conftool/dbconfig/20220111-141425-marostegui.json
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18570 and previous config saved to /var/cache/conftool/dbconfig/20220111-141318-marostegui.json
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 14:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18569 and previous config saved to /var/cache/conftool/dbconfig/20220111-141249-marostegui.json
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18568 and previous config saved to /var/cache/conftool/dbconfig/20220111-135744-marostegui.json
* 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bullseye
* 13:43 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18567 and previous config saved to /var/cache/conftool/dbconfig/20220111-134239-marostegui.json
* 13:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 13:36 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 13:33 moritzm: installing 4.9.290 kernels von stretch systems (no reboots yet)
* 13:29 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18565 and previous config saved to /var/cache/conftool/dbconfig/20220111-132734-marostegui.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18564 and previous config saved to /var/cache/conftool/dbconfig/20220111-132627-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people1003.eqiad.wmnet
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people1003.eqiad.wmnet
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet1002.eqiad.wmnet
* 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet1002.eqiad.wmnet
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18563 and previous config saved to /var/cache/conftool/dbconfig/20220111-122143-marostegui.json
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 cparle@deploy1002: Synchronized wmf-config: Config: [[gerrit:752599{{!}}Enable support for references (T230315)]] (duration: 01m 00s)
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
* 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
* 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18562 and previous config saved to /var/cache/conftool/dbconfig/20220111-121025-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18561 and previous config saved to /var/cache/conftool/dbconfig/20220111-120638-marostegui.json
* 12:00 moritzm: reverting kubetcd2004.codfw.wmnet back to "plain" storage
* 11:56 moritzm: rebalance ganeti row A (all nodes reimaged to Buster)
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18560 and previous config saved to /var/cache/conftool/dbconfig/20220111-115522-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18559 and previous config saved to /var/cache/conftool/dbconfig/20220111-115133-marostegui.json
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18558 and previous config saved to /var/cache/conftool/dbconfig/20220111-114018-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18557 and previous config saved to /var/cache/conftool/dbconfig/20220111-113628-marostegui.json
* 11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18556 and previous config saved to /var/cache/conftool/dbconfig/20220111-113216-marostegui.json
* 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18555 and previous config saved to /var/cache/conftool/dbconfig/20220111-113208-marostegui.json
* 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18554 and previous config saved to /var/cache/conftool/dbconfig/20220111-112514-root.json
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18553 and previous config saved to /var/cache/conftool/dbconfig/20220111-111704-marostegui.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18551 and previous config saved to /var/cache/conftool/dbconfig/20220111-110159-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18550 and previous config saved to /var/cache/conftool/dbconfig/20220111-104654-marostegui.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18549 and previous config saved to /var/cache/conftool/dbconfig/20220111-103941-marostegui.json
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18548 and previous config saved to /var/cache/conftool/dbconfig/20220111-103927-marostegui.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18547 and previous config saved to /var/cache/conftool/dbconfig/20220111-102421-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18546 and previous config saved to /var/cache/conftool/dbconfig/20220111-100917-marostegui.json
* 09:58 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2019.codfw.wmnet with OS buster
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18545 and previous config saved to /var/cache/conftool/dbconfig/20220111-095408-marostegui.json
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18544 and previous config saved to /var/cache/conftool/dbconfig/20220111-095254-marostegui.json
* 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18543 and previous config saved to /var/cache/conftool/dbconfig/20220111-095246-marostegui.json
* 09:51 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
* 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster1001.eqiad.wmnet
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18542 and previous config saved to /var/cache/conftool/dbconfig/20220111-093741-marostegui.json
* 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
* 09:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster1001.eqiad.wmnet
* 09:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
* 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18541 and previous config saved to /var/cache/conftool/dbconfig/20220111-092706-ladsgroup.json
* 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2019.codfw.wmnet with OS buster
* 09:23 ema: cp4021 (upload), cp4027 (text): upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 09:23 hashar: Upgrading Jenkins and Apache on releases1002 & release2002
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18540 and previous config saved to /var/cache/conftool/dbconfig/20220111-092236-marostegui.json
* 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2078.codfw.wmnet with OS bullseye
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
* 09:13 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18539 and previous config saved to /var/cache/conftool/dbconfig/20220111-091201-ladsgroup.json
* 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS buster
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18538 and previous config saved to /var/cache/conftool/dbconfig/20220111-090732-marostegui.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18537 and previous config saved to /var/cache/conftool/dbconfig/20220111-090119-marostegui.json
* 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18536 and previous config saved to /var/cache/conftool/dbconfig/20220111-090111-marostegui.json
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18535 and previous config saved to /var/cache/conftool/dbconfig/20220111-085656-ladsgroup.json
* 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2078.codfw.wmnet with OS bullseye
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18534 and previous config saved to /var/cache/conftool/dbconfig/20220111-084606-marostegui.json
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18533 and previous config saved to /var/cache/conftool/dbconfig/20220111-084151-ladsgroup.json
* 08:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS buster
* 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2124.codfw.wmnet
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2124.codfw.wmnet
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18532 and previous config saved to /var/cache/conftool/dbconfig/20220111-083322-ladsgroup.json
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18531 and previous config saved to /var/cache/conftool/dbconfig/20220111-083314-ladsgroup.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18530 and previous config saved to /var/cache/conftool/dbconfig/20220111-083102-marostegui.json
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bullseye
* 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18529 and previous config saved to /var/cache/conftool/dbconfig/20220111-081809-ladsgroup.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18528 and previous config saved to /var/cache/conftool/dbconfig/20220111-081557-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18527 and previous config saved to /var/cache/conftool/dbconfig/20220111-081442-marostegui.json
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18526 and previous config saved to /var/cache/conftool/dbconfig/20220111-081400-marostegui.json
* 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18525 and previous config saved to /var/cache/conftool/dbconfig/20220111-080305-ladsgroup.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18524 and previous config saved to /var/cache/conftool/dbconfig/20220111-075856-marostegui.json
* 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bullseye
* 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18523 and previous config saved to /var/cache/conftool/dbconfig/20220111-074800-ladsgroup.json
* 07:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2117.codfw.wmnet
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18522 and previous config saved to /var/cache/conftool/dbconfig/20220111-074351-marostegui.json
* 07:42 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2117.codfw.wmnet
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18521 and previous config saved to /var/cache/conftool/dbconfig/20220111-074202-ladsgroup.json
* 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18520 and previous config saved to /var/cache/conftool/dbconfig/20220111-074154-ladsgroup.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18519 and previous config saved to /var/cache/conftool/dbconfig/20220111-072847-marostegui.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18518 and previous config saved to /var/cache/conftool/dbconfig/20220111-072649-ladsgroup.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18517 and previous config saved to /var/cache/conftool/dbconfig/20220111-071729-marostegui.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18516 and previous config saved to /var/cache/conftool/dbconfig/20220111-071721-marostegui.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18515 and previous config saved to /var/cache/conftool/dbconfig/20220111-071254-root.json
* 07:12 taavi: extensions/CentralAuth/maintenance/migrateHiddenLevel.php finished - [[phab:T289068|T289068]]
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18514 and previous config saved to /var/cache/conftool/dbconfig/20220111-071144-ladsgroup.json
* 07:07 marostegui: Failover m2 proxy from dbproxy1015 to dbproxy1013 [[phab:T298586|T298586]]
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18513 and previous config saved to /var/cache/conftool/dbconfig/20220111-070216-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18512 and previous config saved to /var/cache/conftool/dbconfig/20220111-065750-root.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18511 and previous config saved to /var/cache/conftool/dbconfig/20220111-065640-ladsgroup.json
* 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2114.codfw.wmnet
* 06:51 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2114.codfw.wmnet
* 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18510 and previous config saved to /var/cache/conftool/dbconfig/20220111-065118-ladsgroup.json
* 06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 06:50 Amir1: upgrading mysql on ['db2114', 'db2117', 'db2124']
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18509 and previous config saved to /var/cache/conftool/dbconfig/20220111-064712-marostegui.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18508 and previous config saved to /var/cache/conftool/dbconfig/20220111-064247-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18507 and previous config saved to /var/cache/conftool/dbconfig/20220111-063207-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18506 and previous config saved to /var/cache/conftool/dbconfig/20220111-063052-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1012.eqiad.wmnet with OS bullseye
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18505 and previous config saved to /var/cache/conftool/dbconfig/20220111-062743-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2032 after Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18504 and previous config saved to /var/cache/conftool/dbconfig/20220111-062620-marostegui.json
* 06:21 taavi: starting extensions/CentralAuth/maintenance/migrateHiddenLevel.php on a mwmaint1002 screen session - [[phab:T289068|T289068]]
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1012.eqiad.wmnet with OS bullseye
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18503 and previous config saved to /var/cache/conftool/dbconfig/20220111-054417-marostegui.json
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 02:41 eileen: * revision {{Gerrit|d90542c2}} -> {{Gerrit|2956a622}} (latest)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:42 eileen: revision {{Gerrit|277989d7}} -> {{Gerrit|d90542c2}} (latest) civicrm
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/skins/Vector/resources/skins.vector.js/dropdownMenus.js: {{Gerrit|79b33f2}}: Fix TypeError: document.querySelectorAll(...).forEach is not a function ([[phab:T298910|T298910]]) (duration: 00m 59s)
* 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== July 8 ==
== 2022-01-10 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 22:36 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 22:34 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18502 and previous config saved to /var/cache/conftool/dbconfig/20220110-202728-marostegui.json
* 22:45 mutante: zirconium - stop puppet for role switch
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18501 and previous config saved to /var/cache/conftool/dbconfig/20220110-201224-marostegui.json
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18500 and previous config saved to /var/cache/conftool/dbconfig/20220110-195719-marostegui.json
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18499 and previous config saved to /var/cache/conftool/dbconfig/20220110-194214-marostegui.json
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 19:32 ejegg: updated fundraising civicrm from {{Gerrit|3d334f30}} to {{Gerrit|277989d7}}
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 19:29 urbanecm: UTC evening B&C finished
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 19:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8f5ca9af5ef04d1d19759cdf201fc0c7e4ee6fbc}}: Enable TheWikipediaLibrary on most wikis ([[phab:T288070|T288070]]) (duration: 01m 00s)
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:56 csteipp: deployed patches for T103022 & T103023
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18497 and previous config saved to /var/cache/conftool/dbconfig/20220110-184154-marostegui.json
* 20:29 subbu: deployed parsoid sha c4cfc527
* 18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:15 gwicke: bounced cassandra on restbase1001
* 18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18496 and previous config saved to /var/cache/conftool/dbconfig/20220110-184147-marostegui.json
* 19:32 gwicke: stopped cassandra on restbase1008
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18495 and previous config saved to /var/cache/conftool/dbconfig/20220110-182642-marostegui.json
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18494 and previous config saved to /var/cache/conftool/dbconfig/20220110-181137-marostegui.json
* 19:26 urandom: restbase rolling restart
* 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18493 and previous config saved to /var/cache/conftool/dbconfig/20220110-175633-marostegui.json
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18492 and previous config saved to /var/cache/conftool/dbconfig/20220110-175503-marostegui.json
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 17:16 moritzm: installed libwmf security updates on various systems
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 17:09 gwicke: bounced cassandra on restbase1004
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18491 and previous config saved to /var/cache/conftool/dbconfig/20220110-175455-marostegui.json
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18489 and previous config saved to /var/cache/conftool/dbconfig/20220110-173950-marostegui.json
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 17:34 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1016.eqiad.wmnet
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 17:32 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1016.eqiad.wmnet
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 17:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1015.eqiad.wmnet
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 17:28 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1015.eqiad.wmnet
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18488 and previous config saved to /var/cache/conftool/dbconfig/20220110-172446-marostegui.json
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1006.eqiad.wmnet
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1006.eqiad.wmnet
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 17:16 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 17:14 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
* 09:42 ori: morebots, are you OK?
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18487 and previous config saved to /var/cache/conftool/dbconfig/20220110-170941-marostegui.json
* 09:41 godog: bounce nutcracker on silver
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18486 and previous config saved to /var/cache/conftool/dbconfig/20220110-170811-marostegui.json
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 17:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 17:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 09:06 hashar: Jenkins registering jobs with Zuul
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18485 and previous config saved to /var/cache/conftool/dbconfig/20220110-170804-marostegui.json
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18484 and previous config saved to /var/cache/conftool/dbconfig/20220110-165259-marostegui.json
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 16:52 ema: varnish 6.0.9-1wm1 uploaded to buster-wikimedia - component/varnish6 [[phab:T298758|T298758]]
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 16:47 moritzm: installing 5.10.84 kernels on bullseye hosts (no reboots involved, just installing the new kernels in parallel)
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18483 and previous config saved to /var/cache/conftool/dbconfig/20220110-163754-marostegui.json
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18482 and previous config saved to /var/cache/conftool/dbconfig/20220110-162249-marostegui.json
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18481 and previous config saved to /var/cache/conftool/dbconfig/20220110-162122-marostegui.json
* 16:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 16:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18480 and previous config saved to /var/cache/conftool/dbconfig/20220110-162114-marostegui.json
* 16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:19 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet
* 16:18 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:13 root@cumin1001: START - Cookbook sre.dns.netbox
* 16:09 damilare: process-control config {{Gerrit|ecf09aa0}} -> {{Gerrit|66e69bda}}
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18479 and previous config saved to /var/cache/conftool/dbconfig/20220110-160608-marostegui.json
* 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum1001.eqiad.wmnet
* 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1003.eqiad.wmnet
* 15:57 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1003.eqiad.wmnet
* 15:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum1001.eqiad.wmnet
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18478 and previous config saved to /var/cache/conftool/dbconfig/20220110-155103-marostegui.json
* 15:49 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
* 15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode1001.eqiad.wmnet
* 15:45 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode1001.eqiad.wmnet
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18476 and previous config saved to /var/cache/conftool/dbconfig/20220110-153559-marostegui.json
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18475 and previous config saved to /var/cache/conftool/dbconfig/20220110-153429-marostegui.json
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18474 and previous config saved to /var/cache/conftool/dbconfig/20220110-153421-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18472 and previous config saved to /var/cache/conftool/dbconfig/20220110-151917-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18471 and previous config saved to /var/cache/conftool/dbconfig/20220110-150412-marostegui.json
* 14:55 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb1002.eqiad.wmnet
* 14:51 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:49 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752277{{!}}Give priority to PreparedUpdate (T288639)]] (duration: 01m 00s)
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18470 and previous config saved to /var/cache/conftool/dbconfig/20220110-144907-marostegui.json
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18469 and previous config saved to /var/cache/conftool/dbconfig/20220110-144737-marostegui.json
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:36 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb1002.eqiad.wmnet
* 14:32 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1001.wikimedia.org
* 14:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1001.wikimedia.org
* 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
* 14:19 jelto: upload wmf-sre-laptop 0.5.3 deb package
* 14:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
* 14:07 jbond: disable puppet fleet wide for puppetdb restart
* 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 13:57