You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye)
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance)
 
(140 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-03-24 ==
== 2022-08-13 ==
* 00:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32379 and previous config saved to /var/cache/conftool/dbconfig/20220813-133713-ladsgroup.json
* 00:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32378 and previous config saved to /var/cache/conftool/dbconfig/20220813-132207-ladsgroup.json
* 00:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32377 and previous config saved to /var/cache/conftool/dbconfig/20220813-130701-ladsgroup.json
* 00:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32376 and previous config saved to /var/cache/conftool/dbconfig/20220813-125156-ladsgroup.json
* 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage


== 2022-03-23 ==
== 2022-08-12 ==
* 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg [[phab:T315121|T315121]]
* 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer [[phab:T315121|T315121]]
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
* 23:34 brennen: trainsperiment ([[phab:T300203|T300203]]): reverting to 1.39.0-wmf.3 on all wikis for [[phab:T304564|T304564]]; will move forward again after a fix.
* 20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes [[phab:T301770|T301770]]
* 20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
* 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
* 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
* 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
* 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
* 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
* 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
* 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773331{{!}}Enable split A/B testing on beta cluster (T301584)]] (duration: 00m 50s)
* 18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
* 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:772408{{!}}Allow autoconfirmed users to view basic IP information (T303858)]] and [[gerrit:767216{{!}}Enable IPInfo on testwiki (T260598)]] (duration: 00m 50s)
* 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
* 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
* 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
* 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:771448{{!}}DynamicSidebar: remove unused extension (T304006)]] (duration: 00m 49s)
* 15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771447{{!}}DynamicSidebar: remove from InitialiseSettings]] (duration: 00m 51s)
* 15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
* 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
* 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
* 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
* 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771444{{!}}DynamicSidebar: remove from CommonSettings (T304006)]] (duration: 00m 50s)
* 06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
* 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771443{{!}}wikitech: Remove DynamicSidebar (T304006)]] (duration: 00m 52s)
* 05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
* 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
* 19:37 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
* 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json
* 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:47 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
* 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 49m 41s)
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 17:48 brennen: trainsperiment ([[phab:T300203|T300203]]): starting prep for 1.39.0-wmf.4
* 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # [[phab:T304052|T304052]]
* 15:38 urbanecm: Created shnwikivoyage and guwwiki
* 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
* 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 04s)
* 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 07s)
* 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 06s)
* 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki ([[phab:T303727|T303727]])
* 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 10s)
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage ([[phab:T302797|T302797]])
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
* 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: {{Gerrit|9a0aed0}}: addWiki: Create GrowthExperiment tables for all new Wikipedias ([[phab:T304052|T304052]]) (duration: 01m 06s)
* 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
* 14:37 mmandere: depool cp1080 for reimage - [[phab:T290005|T290005]]
* 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:23 bblack: reboot cp1085 (downtimed)
* 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
* 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:48 Lucas_WMDE: UTC afternoon backport window done
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (2/2, production no-op) (duration: 01m 05s)
* 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (1/2, production no-op) (duration: 01m 07s)
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768090{{!}}Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients]] (duration: 01m 10s)
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:38 moritzm: restarting superset for OpenSSL update
* 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
* 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
* 13:07 mmandere: depool cp1082 for reimage - [[phab:T290005|T290005]]
* 12:58 moritzm: installing bind security updates
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
* 12:29 moritzm: restarting Turnilo for OpenSSL update
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
* 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
* 11:33 moritzm: installing apache security updates on stretch
* 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
* 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
* 10:30 moritzm: restarting ntpd
* 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
* 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
* 09:56 mmandere: depool cp1081 for reimage - [[phab:T290005|T290005]]
* 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
* 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
* 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
* 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:43 moritzm: installing openssl security updates
* 08:36 mmandere: depool cp1079 for reimage - [[phab:T290005|T290005]]
* 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
* 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
* 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:20 ejegg: updated payments-wiki from {{Gerrit|3048f0aa}} to {{Gerrit|28e24856}}
* 00:11 cjming: end running skin preference update script [[phab:T299104|T299104]]


== 2022-03-22 ==
== 2022-08-11 ==
* 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad]
* 22:27 pt1979@cumin1001


== 2022-03-21 ==
== 2022-08-10 ==
* 23:52 eileen: civicrm revision changed from {{Gerrit|52c45874}} to {{Gerrit|30c55f51}}
* 21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
* 22:29 ryankemper: [[phab:T301955|T301955]] Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
* 21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 21:59 ryankemper: [[phab:T301955|T301955]] Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
* 21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 21:00 cjming: end of UTC late backport window
* 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:45 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 20:59 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820533{{!}}Remove unused $wgEnableMWSuggest]] (duration: 03m 04s)
* 21:45 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:44 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820568{{!}}Enable new topic tool on dewiki (T313699)]] (duration: 03m 01s)
* 21:44 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
* 20:34 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822093{{!}}testwiki: set $wgCdnMatchParameterOrder to false (T314868)]] (duration: 03m 20s)
* 21:43 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:43 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:40 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 20:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 21:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 21:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 21:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 21:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 21:28 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 21:28 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 21:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 21:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 21:26 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
* 21:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
* 21:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:03 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:56 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:49 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]] (duration: 00m 51s)
* 20:49 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 20:31 urbanecm: UTC late backport window completed
* 20:29 urbanecm@deploy1002: Synchronized docroot/noc/db.php: {{Gerrit|3bcccdc}}: Migrate away from $wmfDbconfigFromEtcd ([[phab:T45956|T45956]]; 2/2) (duration: 00m 50s)
* 20:29 urbanecm@deploy1002: Synchronized wmf-config/etcd.php: {{Gerrit|3bcccdc}}: Migrate away from $wmfDbconfigFromEtcd ([[phab:T45956|T45956]]; 1/2) (duration: 00m 50s)
* 20:19 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|8347de5}}: ExtensionDistributor: Add REL1_38 ([[phab:T304185|T304185]]) (duration: 00m 51s)
* 19:48 brennen: mw1416: sudo -i /usr/local/sbin/restart-php7.2-fpm
* 19:42 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 19:26 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]] (duration: 64m 33s)
* 18:54 ebernhardson: [[phab:T303548|T303548]] start commonswiki reindexing on eqiad codfw and cloudelastic cirrus clusters
* 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22906 and previous config saved to /var/cache/conftool/dbconfig/20220321-185042-marostegui.json
* 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22905 and previous config saved to /var/cache/conftool/dbconfig/20220321-183537-marostegui.json
* 18:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22904 and previous config saved to /var/cache/conftool/dbconfig/20220321-182032-marostegui.json
* 18:19 otto@deploy1002: Finished deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]] (duration: 04m 41s)
* 18:19 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.1 on all wikis; starting prep of wmf.2, will abort if needed
* 18:15 otto@deploy1002: Started deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]]
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22903 and previous config saved to /var/cache/conftool/dbconfig/20220321-180526-marostegui.json
* 18:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 18:03 otto@deploy1002: Finished deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]] (duration: 07m 19s)
* 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1021 for [[phab:T302233|T302233]]
* 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1020 for [[phab:T302233|T302233]]
* 17:57 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1016 for [[phab:T302233|T302233]]
* 17:55 otto@deploy1002: Started deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]]
* 17:53 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 17:51 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1017 for [[phab:T302233|T302233]]
* 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1013 for [[phab:T302233|T302233]]
* 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 for [[phab:T302233|T302233]]
* 17:46 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1014
* 17:41 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
* 17:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS (duration: 02m 12s)
* 17:38 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.107` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 17:37 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS
* 17:35 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 after same command without `--table` argument timed out waiting for `zhwiki_p.page`
* 17:32 ryankemper: [Maps] Running puppet agent on rest of `maps*`: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'maps*' 'run-puppet-agent'`
* 17:31 ryankemper: [Maps] Ran puppet agent on maps master `maps1009` to verify puppet patch works; looks like osm import was disabled as intended `Notice: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]/ensure: ensure changed 'running' to 'stopped'`
* 17:26 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7]: 0.3.107 (duration: 08m 26s)
* 17:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.107` on canary `wdqs1003`; proceeding to rest of fleet
* 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7]: 0.3.107
* 17:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.107`. Pre-deploy tests passing on canary `wdqs1003`
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22902 and previous config saved to /var/cache/conftool/dbconfig/20220321-170731-marostegui.json
* 17:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 16:58 brennen: trainsperiment ([[phab:T300203|T300203]]): blockers currently cleared, will hold wmf.1 -> group2 until 18:00 UTC, per deployment calendar
* 16:55 taavi@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: [[gerrit:772364{{!}}PageSplitter: check for OutputPage::getTitle() returning null (T304331)]] (duration: 00m 50s)
* 16:53 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: [[gerrit:772363{{!}}PageSplitter: check for OutputPage::getTitle() returning null (T304331)]] (duration: 00m 51s)
* 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:46 razzi@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 16:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Wikibase/repo/: Backport: [[gerrit:772361{{!}}Add display to wbsearchentities response even if empty (T104344)]] (duration: 00m 53s)
* 16:15 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 16:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 16:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 16:13 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 16:12 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 16:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
* 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
* 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 16:11 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
* 16:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
* 16:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22901 and previous config saved to /var/cache/conftool/dbconfig/20220321-160557-marostegui.json
* 16:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
* 16:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 16:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 16:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 16:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 15:59 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 15:59 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 15:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 15:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:57 otto@deploy1002: Finished deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]] (duration: 07m 18s)
* 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 15:56 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: [[phab:T304111|T304111]] (duration: 00m 50s)
* 15:56 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 15:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
* 15:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22900 and previous config saved to /var/cache/conftool/dbconfig/20220321-155052-marostegui.json
* 15:50 otto@deploy1002: Started deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]]
* 15:50 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]] (duration: 00m 03s)
* 15:50 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]]
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22899 and previous config saved to /var/cache/conftool/dbconfig/20220321-154607-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22898 and previous config saved to /var/cache/conftool/dbconfig/20220321-154559-marostegui.json
* 15:44 reedy@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/Flow/maintenance: [[phab:T304318|T304318]] (duration: 00m 49s)
* 15:44 razzi@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 15:43 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Flow/maintenance: [[phab:T304318|T304318]] (duration: 00m 51s)
* 15:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:772421{{!}} Bumping portals to master (T128546)]] (duration: 00m 53s)
* 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:772421{{!}} Bumping portals to master (T128546)]] (duration: 00m 53s)
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22897 and previous config saved to /var/cache/conftool/dbconfig/20220321-153547-marostegui.json
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22896 and previous config saved to /var/cache/conftool/dbconfig/20220321-153054-marostegui.json
* 15:27 mmandere: pool cp1075 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:23 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]] (duration: 01m 54s)
* 15:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22895 and previous config saved to /var/cache/conftool/dbconfig/20220321-152041-marostegui.json
* 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS buster
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22894 and previous config saved to /var/cache/conftool/dbconfig/20220321-151549-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22893 and previous config saved to /var/cache/conftool/dbconfig/20220321-150417-marostegui.json
* 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 15:01 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 07m 10s)
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22891 and previous config saved to /var/cache/conftool/dbconfig/20220321-150044-marostegui.json
* 14:58 XioNoX: asw2-b-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 2 member 5 - [[phab:T304316|T304316]]
* 14:54 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 14:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 14:43 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
* 14:35 hashar: Restarting CI Zuul server
* 14:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 14:31 hashar: restarting Apache on gerrit2001 and gerrit1001
* 14:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS buster
* 14:28 hashar@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Echo: Revert "Call IDatabase::timestamp before inserting rows" - [[phab:T304307|T304307]] (duration: 00m 52s)
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22890 and previous config saved to /var/cache/conftool/dbconfig/20220321-141922-marostegui.json
* 14:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 14:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 14:07 mmandere: depool cp1075  - [[phab:T290005|T290005]]
* 14:07 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 14:05 mmandere: depool cp1074 -  [[phab:T290005|T290005]]
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22889 and previous config saved to /var/cache/conftool/dbconfig/20220321-140417-marostegui.json
* 14:04 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 14:03 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 14:03 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 14:02 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 14:01 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 13:55 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 00m 06s)
* 13:55 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22888 and previous config saved to /var/cache/conftool/dbconfig/20220321-134912-marostegui.json
* 13:34 Lucas_WMDE: UTC afternoon backport window done
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22887 and previous config saved to /var/cache/conftool/dbconfig/20220321-133407-marostegui.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771923{{!}}Revert "ptwiki: Disable Growth's image recommendation" (T304095)]] (duration: 00m 49s)
* 13:29 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 08m 53s)
* 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:769748{{!}}Remove unused wgWMESearchRelevancePages config variable]] (duration: 00m 50s)
* 13:20 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771998{{!}}Create "pagemover" group at azwiki (T303752)]] (duration: 00m 50s)
* 13:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771691{{!}}Remove changetags right from users on wikidatawiki and testwikidatawiki (T303682)]] (while keeping applychangetags right) (duration: 00m 49s)
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 12:48 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 12:42 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.26"
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22885 and previous config saved to /var/cache/conftool/dbconfig/20220321-123055-marostegui.json
* 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22884 and previous config saved to /var/cache/conftool/dbconfig/20220321-123042-marostegui.json
* 12:22 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22883 and previous config saved to /var/cache/conftool/dbconfig/20220321-121537-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22882 and previous config saved to /var/cache/conftool/dbconfig/20220321-120032-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22881 and previous config saved to /var/cache/conftool/dbconfig/20220321-114527-marostegui.json
* 11:41 jnuche@deploy1002: Pruned MediaWiki: 1.38.0-wmf.25 (duration: 01m 32s)
* 11:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to {{Gerrit|2ef5c2d}} (duration: 01m 40s)
* 11:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to {{Gerrit|2ef5c2d}}
* 11:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to {{Gerrit|2ef5c2d}} (duration: 02m 51s)
* 11:35 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]] (duration: 81m 15s)
* 11:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to {{Gerrit|2ef5c2d}}
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22880 and previous config saved to /var/cache/conftool/dbconfig/20220321-112217-marostegui.json
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22879 and previous config saved to /var/cache/conftool/dbconfig/20220321-112210-marostegui.json
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22878 and previous config saved to /var/cache/conftool/dbconfig/20220321-110705-marostegui.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22877 and previous config saved to /var/cache/conftool/dbconfig/20220321-105159-marostegui.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22876 and previous config saved to /var/cache/conftool/dbconfig/20220321-103654-marostegui.json
* 10:13 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22875 and previous config saved to /var/cache/conftool/dbconfig/20220321-094614-marostegui.json
* 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:32 hashar: 1.39.0-wmf.1 train is delayed due to a CI / npm build failure which is being resolved [[phab:T300203|T300203]]
* 09:08 dcausse: restarting blazegraph on wdqs2001 (stuck)
* 09:07 moritzm: restarting FPM
* 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22874 and previous config saved to /var/cache/conftool/dbconfig/20220321-090250-marostegui.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22873 and previous config saved to /var/cache/conftool/dbconfig/20220321-084745-marostegui.json
* 08:43 hashar: Train blocked due to a npm checksum mismatch preventing CI from merging in the mediawiki/core 1.39.0-wmf.1 change which create the branch. [[phab:T304286|T304286]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22872 and previous config saved to /var/cache/conftool/dbconfig/20220321-083240-marostegui.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22871 and previous config saved to /var/cache/conftool/dbconfig/20220321-083050-root.json
* 08:23 dcausse: restarting blazegraph on wdqs2003 (stuck for 16 hours)
* 08:19 moritzm: installing openssl security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22870 and previous config saved to /var/cache/conftool/dbconfig/20220321-081735-marostegui.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22869 and previous config saved to /var/cache/conftool/dbconfig/20220321-081546-root.json
* 08:09 dcausse: restarting blazegraph on wdqs2004 and wdqs2002 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22868 and previous config saved to /var/cache/conftool/dbconfig/20220321-080042-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22867 and previous config saved to /var/cache/conftool/dbconfig/20220321-074538-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22866 and previous config saved to /var/cache/conftool/dbconfig/20220321-073033-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22865 and previous config saved to /var/cache/conftool/dbconfig/20220321-072902-marostegui.json
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22864 and previous config saved to /var/cache/conftool/dbconfig/20220321-072854-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22863 and previous config saved to /var/cache/conftool/dbconfig/20220321-071349-marostegui.json
* 07:12 urbanecm: UTC morning B&C done
* 07:08 urbanecm: Create `wikishared.cx_significant_edits` and `wikishared.cx_section_translation` at x1 ([[phab:T302371|T302371]]; `mwscript sql.php --wiki=aawiki --wikidb=wikishared --cluster=extension1 /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/<nowiki>{</nowiki>section-translations,significant-edits<nowiki>}</nowiki>.sql)`)
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22862 and previous config saved to /var/cache/conftool/dbconfig/20220321-065844-marostegui.json
* 06:43 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22861 and previous config saved to /var/cache/conftool/dbconfig/20220321-064339-marostegui.json
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 06:19 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 05:52 marostegui: dbmaint s5@eqiad [[phab:T300600|T300600]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 reimage [[phab:T300600|T300600]]', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22859 and previous config saved to /var/cache/conftool/dbconfig/20220321-054838-marostegui.json
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22858 and previous config saved to /var/cache/conftool/dbconfig/20220321-054358-marostegui.json
* 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
 
== 2022-03-20 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
* 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
* 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
* 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json
 
== 2022-03-19 ==
* 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
* 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22844 and previous config saved to /var/cache/conftool/dbconfig/20220319-171749-marostegui.json
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22843 and previous config saved to /var/cache/conftool/dbconfig/20220319-170244-marostegui.json
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22842 and previous config saved to /var/cache/conftool/dbconfig/20220319-164739-marostegui.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22841 and previous config saved to /var/cache/conftool/dbconfig/20220319-163234-marostegui.json
* 13:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 13:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 13:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 13:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 13:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=piwiki --move-talk --fix # [[phab:T304201|T304201]]
* 13:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 04:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 04:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 03:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:28 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 03:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22839 and previous config saved to /var/cache/conftool/dbconfig/20220319-015847-marostegui.json
* 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22838 and previous config saved to /var/cache/conftool/dbconfig/20220319-015839-marostegui.json
* 01:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 01:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22837 and previous config saved to /var/cache/conftool/dbconfig/20220319-014334-marostegui.json
* 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22836 and previous config saved to /var/cache/conftool/dbconfig/20220319-012829-marostegui.json
* 01:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22835 and previous config saved to /var/cache/conftool/dbconfig/20220319-011324-marostegui.json
* 00:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
 
== 2022-03-18 ==
* 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 21:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 15:38 jayme: powercycle kubernetes1002
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: [[gerrit:771907{{!}}Don't pass the revision to PO access service (T304127)]] (duration: 00m 49s)
* 14:12 XioNoX: configure NAT for civi1002 - [[phab:T304098|T304098]]
* 14:02 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 14:02 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 14:01 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 14:01 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 13:59 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 13:59 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 13:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
* 13:07 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
* 13:02 moritzm: imported python3.5  3.5.3-1+deb9u5+wmf1 to component/python35 [[phab:T303801|T303801]]
* 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 11:35 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 11:29 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 11:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 11:09 vgutierrez: rolling restart of nginx on ncredir instances to catch up on OpenSSL updates
* 11:05 vgutierrez: restarting acme-chief and acme-chief API services to catch up on OpenSSL updates
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 10:54 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 10:52 akosiaris: drain kubernetes200[1-4] [[phab:T303045|T303045]]
* 10:51 akosiaris: depool kubernetes200[1-4] [[phab:T303045|T303045]]
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2004.codfw.wmnet
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2003.codfw.wmnet
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2002.codfw.wmnet
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2001.codfw.wmnet
* 10:01 akosiaris: drain kubernetes100[1-4] [[phab:T303044|T303044]]
* 09:54 akosiaris: depool kubernetes100[1-4] from pybal [[phab:T303044|T303044]]
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1004.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1003.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1002.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
* 09:42 akosiaris: uncordon kubernetes1018-1022. [[phab:T293728|T293728]]. Nodes are live, ready to receive workloads and traffic.
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal. [[phab:T293728|T293728]]
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal.
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1022.eqiad.wmnet
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1021.eqiad.wmnet
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1020.eqiad.wmnet
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1019.eqiad.wmnet
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1018.eqiad.wmnet
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22827 and previous config saved to /var/cache/conftool/dbconfig/20220318-093543-marostegui.json
* 09:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1022.eqiad.wmnet
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1021.eqiad.wmnet
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1020.eqiad.wmnet
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1019.eqiad.wmnet
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1018.eqiad.wmnet
* 09:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:08 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22826 and previous config saved to /var/cache/conftool/dbconfig/20220318-085517-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22825 and previous config saved to /var/cache/conftool/dbconfig/20220318-084012-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22824 and previous config saved to /var/cache/conftool/dbconfig/20220318-082507-marostegui.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22823 and previous config saved to /var/cache/conftool/dbconfig/20220318-081002-marostegui.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22822 and previous config saved to /var/cache/conftool/dbconfig/20220318-072852-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22821 and previous config saved to /var/cache/conftool/dbconfig/20220318-071758-marostegui.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22820 and previous config saved to /var/cache/conftool/dbconfig/20220318-071750-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22819 and previous config saved to /var/cache/conftool/dbconfig/20220318-071348-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22818 and previous config saved to /var/cache/conftool/dbconfig/20220318-070245-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22817 and previous config saved to /var/cache/conftool/dbconfig/20220318-065844-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22816 and previous config saved to /var/cache/conftool/dbconfig/20220318-064740-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22815 and previous config saved to /var/cache/conftool/dbconfig/20220318-064340-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22814 and previous config saved to /var/cache/conftool/dbconfig/20220318-063631-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22813 and previous config saved to /var/cache/conftool/dbconfig/20220318-063524-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22812 and previous config saved to /var/cache/conftool/dbconfig/20220318-063235-marostegui.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22811 and previous config saved to /var/cache/conftool/dbconfig/20220318-062836-root.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22810 and previous config saved to /var/cache/conftool/dbconfig/20220318-062127-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22809 and previous config saved to /var/cache/conftool/dbconfig/20220318-062020-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22808 and previous config saved to /var/cache/conftool/dbconfig/20220318-061332-root.json
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bullseye
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22807 and previous config saved to /var/cache/conftool/dbconfig/20220318-060623-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22806 and previous config saved to /var/cache/conftool/dbconfig/20220318-060516-root.json
* 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22805 and previous config saved to /var/cache/conftool/dbconfig/20220318-055119-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22804 and previous config saved to /var/cache/conftool/dbconfig/20220318-055012-root.json
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bullseye
* 05:39 marostegui: dbmaint on s3@eqiad [[phab:T300600|T300600]]
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 reimage [[phab:T300600|T300600]]', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22802 and previous config saved to /var/cache/conftool/dbconfig/20220318-053615-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22801 and previous config saved to /var/cache/conftool/dbconfig/20220318-053508-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22800 and previous config saved to /var/cache/conftool/dbconfig/20220318-053443-marostegui.json
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 01:23 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:14 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
 
== 2022-03-17 ==
* 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:28 derick@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: [[gerrit:771606{{!}}Add new field to capture application URL link on Meta]] (duration: 00m 50s)
* 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:17 derick@deploy1002: Finished scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]] (duration: 11m 37s)
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:05 derick@deploy1002: Started scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]]
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:00 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771712{{!}}Revert "Revert "Revert "Enable Parsoid API everywhere"""]] (duration: 00m 51s)
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771707{{!}}Revert "Revert "Enable Parsoid API everywhere""]] (duration: 00m 51s)
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:45 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 21:40 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 21:35 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:23 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
* 21:21 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: [[gerrit:771394{{!}}Update invalid skin preference update script (T299104)]] (duration: 00m 51s)
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]] (duration: 00m 50s)
* 21:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 20:57 ladsgroup@deploy1002: Finished scap: Revert "rdbms: Followups to automatic connection recovery patch" (duration: 11m 50s)
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 ladsgroup@deploy1002: Started scap: Revert "rdbms: Followups to automatic connection recovery patch"
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22798 and previous config saved to /var/cache/conftool/dbconfig/20220317-204128-marostegui.json
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:29 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:763779{{!}}Revert "Enable Parsoid API everywhere" (T302081)]] (duration: 00m 50s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22797 and previous config saved to /var/cache/conftool/dbconfig/20220317-202623-marostegui.json
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22796 and previous config saved to /var/cache/conftool/dbconfig/20220317-201118-marostegui.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820646{{!}}Start writing to cuc_actor everywhere except s4 and s8 (T233004)]] (duration: 03m 15s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22795 and previous config saved to /var/cache/conftool/dbconfig/20220317-195613-marostegui.json
* 19:51 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2053-2054].codfw.wmnet
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:51 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2053-2054].codfw.wmnet
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2019-2020].codfw.wmnet
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2019-2020].codfw.wmnet
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
* 18:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
* 18:53 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.26/skins/Vector/includes/Hooks.
* 19:34 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2036.codfw.wmnet
* 19:34 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2036.codfw.wmnet
* 19:28 sukhe: testing ATS 9.1.3-1wm1 on cp4026: [[phab:T309651|T309651]]
* 19:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS bullseye
* 19:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS bullseye
* 18:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1087.eqiad.wmnet with reason:


== 2022-03-16 ==
== 2022-08-08 ==
* 23:52 tzatziki: Removing  two files for legal compliance
* 23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 19s)
* 21:17 cjming: end running skin update preference maintenance script
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:52 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 27s)
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [no-op] {{Gerrit|8efa537}}: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion ([[phab:T303240|T303240]]) (duration: 00m 53s)
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: {{Gerrit|9ba157b}}: Add insert option for update skin preferences script ([[phab:T299104|T299104]]) (duration: 00m 50s)
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: {{Gerrit|ebfc516}}: Add script to update vector skin preferences ([[phab:T299104|T299104]]) (duration: 00m 51s)
* 23:32 eileen___: config revision changed from {{Gerrit|f5668044}} to 787cd0e0<eileen___> eileen
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:32 eileen___: civicrm upgraded from {{Gerrit|497bddf7}} to {{Gerrit|1f91ac2d}}
* 20:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 22:16 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 20:13 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic1065.eqiad.wmnet with OS bullseye
* 20:13 urbanecm@deploy1002: Synchronized docroot/noc/db.php: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 3/3) (duration: 00m 49s)
* 21:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 20:12 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 2/3) (duration: 00m 50s)
* 21:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 20:11 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 1/3) (duration: 00m 50s)
* 21:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1065.eqiad.wmnet with OS bullseye
* 19:22 otto@deploy1002: Finished deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided) (duration: 07m 50s)
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1062.eqiad.wmnet with OS bullseye
* 19:14 otto@deploy1002: Started deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided)
* 20:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 18:32 sukhe: running: homer "cr*-drmrs*" commit "Gerrit 771359: Set up BGP peering in drmrs for Wikidough."
* 20:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 18:09 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f] (duration: 00m 08s)
* 20:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1062.eqiad.wmnet with OS bullseye
* 18:09 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f]
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:02 aqu@deploy1002: Finished deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f] (duration: 00m 08s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:02 aqu@deploy1002: Started deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f]
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 20:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 18:00 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 20:28 cjming: end of UTC late backport window
* 17:36 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:771001{{!}}mwscript: Support --force-version flag (T303878)]] (duration: 00m 57s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 20:27 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.styles/layouts/grid.less: Backport: [[gerrit:821243{{!}}Fix grid blowout bug (T314756)]] (duration: 03m 26s)
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:13 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 07m 23s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817785{{!}}Disable sticky header edit A/B test for pilot wikis (T312296)]] (duration: 03m 35s)
* 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:06 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 00m 07s)
* 17:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS bullseye
* 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS bullseye
* 16:48 aqu@deploy1002: Finished deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 25m 49s)
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS bullseye
* 16:45 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:37 Emperor: rolling restart of ms-fe10[09-12] so they know about removal of older proxies [[phab:T303733|T303733]]
* 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 16:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 16:28 Emperor: moving swiftrepl and stats reporter host from ms-fe1005 to ms-fe1009 [[phab:T303733|T303733]]
* 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 16:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 16:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 16:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic1085.eqiad.wmnet with OS bullseye
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22734 and previous config saved to /var/cache/conftool/dbconfig/20220316-162721-marostegui.json
* 16:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 16:22 aqu@deploy1002: Started deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 16:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22733 and previous config saved to /var/cache/conftool/dbconfig/20220316-161216-marostegui.json
* 16:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 16:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 16:02 aqu: analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow"
* 16:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:59 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
* 16:10 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22732 and previous config saved to /var/cache/conftool/dbconfig/20220316-155711-marostegui.json
* 16:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 16:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS bullseye
* 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 15:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22731 and previous config saved to /var/cache/conftool/dbconfig/20220316-155300-marostegui.json
* 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 15:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 15:46 sukhe: upload reprepro -C main include bullseye-wikimedia python-pynetbox_6.6.0-1+wmf11u1_amd64.changes
* 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 15:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
* 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 15:51 moritzm: restarting exim/spamasassin on MXes to pick up new OpenSSL
* 15:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS bullseye
* 15:49 urbanecm@deploy1002: Synchronized wmf-config/logos.php: cswiki celebration logo (duration: 00m 49s)
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:46 urbanecm@deploy1002: Synchronized static/images/project-logos/: cswiki celebration logos (duration: 00m 50s)
* 14:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 15:43 dancy@deploy1002: scap failed: RuntimeError dictionary changed size during iteration (duration: 25m 55s)
* 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22730 and previous config saved to /var/cache/conftool/dbconfig/20220316-154206-marostegui.json
* 14:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:38 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 14:11 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:37 ryankemper: [WCQS] Restarted updater across fleet to get out jvm sec upgrades: `ryankemper@cumin1001:~$ sudo -E cumin 'wcqs*' 'systemctl restart wcqs-updater.service'`
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:17 dancy@deploy1002: Started scap: testing mediawiki image build
* 12:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:15 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediaw
* 12:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|77fd5abdd7d9462869259e1511bbcf2d7ce62246}}: Growth: Add new rights to wgAvailableRights (duration: 03m 24s)
* 15:12 dancy@deploy1002: Started scap: (no justification provided)
* 12:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
* 15:11 dancy: Testing mediawiki image build on deploy server again
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 12:06 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/: {{Gerrit|3eaf155678b7313c55dcca0cd39ab29f73eead37}}: MentorTools: Do not use MentorWeightManager ([[phab:T314362|T314362]]) (duration: 03m 31s)
* 15:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 11:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
* 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:21 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2022.codfw.wmnet
* 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:21 jelto: kubectl uncordon kubernetes2022.codfw.wmnet
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22729 and previous config saved to /var/cache/conftool/dbconfig/20220316-150433-marostegui.json
* 10:43 Amir1: Removing db2079 from orchestrator ([[phab:T313885|T313885]])
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22728 and previous config saved to /var/cache/conftool/dbconfig/20220316-145946-marostegui.json
* 10:39 Amir1: Removing db2079 from zarcillo ([[phab:T313885|T313885]])
* 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2079.codfw.wmnet
* 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:55 sukhe: rolling restart of nginx.service on durum* hosts for OpenSSL updates
* 10:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 14:55 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: [[gerrit:770937{{!}}Add script to update vector skin preferences (T299104)]] (duration: 00m 51s)
* 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2079.codfw.wmnet
* 14:53 moritzm: restarting nginx/dhcpd on install/apt servers
* 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 14:53 sukhe: rolling restart of pdns-recursor.service and dnsdist.service on doh* hosts for OpenSSL updates
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:41 jbond: deploy libtirpc update
* 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32310 and previous config saved to /var/cache/conftool/dbconfig/20220808-075723-ladsgroup.json
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32309 and previous config saved to /var/cache/conftool/dbconfig/20220808-075702-ladsgroup.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22727 and previous config saved to /var/cache/conftool/dbconfig/20220316-144928-marostegui.json
* 07:53 godog: grow sda/sdb 3 by 100G on thanos-be2001 - [[phab:T314275|T314275]]
* 14:47 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:50 godog: grow sda/sdb 3 by 100G on thanos-be1004 - [[phab:T314275|T314275]]
* 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32308 and previous config saved to /var/cache/conftool/dbconfig/20220808-074156-ladsgroup.json
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32307 and previous config saved to /var/cache/conftool/dbconfig/20220808-072650-ladsgroup.json
* 14:35 XioNoX: add anycast6 peers in drmrs
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22726 and previous config saved to /var/cache/conftool/dbconfig/20220316-143423-marostegui.json
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820815{{!}}trwikivoyage: Create rollbacker user group (T314678)]] (duration: 03m 17s)
* 14:25 Emperor: depooling ms-fe100[5-8] prior to decommissioning [[phab:T303733|T303733]]
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22725 and previous config saved to /var/cache/conftool/dbconfig/20220316-141918-marostegui.json
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
* 07:11 elukey: restart rsyslog on ml-serve2007
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32306 and previous config saved to /var/cache/conftool/dbconfig/20220808-071144-ladsgroup.json
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22724 and previous config saved to /var/cache/conftool/dbconfig/20220316-141708-marostegui.json
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:12 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php: Backport: [[gerrit:770942{{!}}Replace use of deprecated RecentChange::getEngine (T303861)]] (duration: 00m 51s)
* 14:10 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service [[phab:T303064|T303064]]
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22723 and previous config saved to /var/cache/conftool/dbconfig/20220316-140203-marostegui.json
* 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22722 and previous config saved to /var/cache/conftool/dbconfig/20220316-134658-marostegui.json
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22721 and previous config saved to /var/cache/conftool/dbconfig/20220316-133458-marostegui.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22720 and previous config saved to /var/cache/conftool/dbconfig/20220316-133153-marostegui.json
* 13:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
* 13:25 krinkle@deploy1002: Synchronized w/static.php: {{Gerrit|159dfd21d}} (duration: 00m 50s)
* 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS buster
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22718 and previous config saved to /var/cache/conftool/dbconfig/20220316-131953-marostegui.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22717 and previous config saved to /var/cache/conftool/dbconfig/20220316-131429-marostegui.json
* 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22716 and previous config saved to /var/cache/conftool/dbconfig/20220316-131421-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:07 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771331{{!}}Deploy template features to enwiki (T302857)]] (duration: 00m 50s)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22715 and previous config saved to /var/cache/conftool/dbconfig/20220316-130448-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22714 and previous config saved to /var/cache/conftool/dbconfig/20220316-125916-marostegui.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22713 and previous config saved to /var/cache/conftool/dbconfig/20220316-125803-marostegui.json
* 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22712 and previous config saved to /var/cache/conftool/dbconfig/20220316-125755-marostegui.json
* 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22711 and previous config saved to /var/cache/conftool/dbconfig/20220316-124943-marostegui.json
* 12:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22710 and previous config saved to /var/cache/conftool/dbconfig/20220316-124742-marostegui.json
* 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22709 and previous config saved to /var/cache/conftool/dbconfig/20220316-124734-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22708 and previous config saved to /var/cache/conftool/dbconfig/20220316-124411-marostegui.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22707 and previous config saved to /var/cache/conftool/dbconfig/20220316-124250-marostegui.json
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22705 and previous config saved to /var/cache/conftool/dbconfig/20220316-123229-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22704 and previous config saved to /var/cache/conftool/dbconfig/20220316-122906-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22703 and previous config saved to /var/cache/conftool/dbconfig/20220316-122745-marostegui.json
* 12:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22702 and previous config saved to /var/cache/conftool/dbconfig/20220316-121724-marostegui.json
* 12:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22701 and previous config saved to /var/cache/conftool/dbconfig/20220316-121240-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22700 and previous config saved to /var/cache/conftool/dbconfig/20220316-120219-marostegui.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22699 and previous config saved to /var/cache/conftool/dbconfig/20220316-120100-marostegui.json
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22698 and previous config saved to /var/cache/conftool/dbconfig/20220316-120047-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22697 and previous config saved to /var/cache/conftool/dbconfig/20220316-114542-marostegui.json
* 11:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22695 and previous config saved to /var/cache/conftool/dbconfig/20220316-113200-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22694 and previous config saved to /var/cache/conftool/dbconfig/20220316-113152-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22693 and previous config saved to /var/cache/conftool/dbconfig/20220316-113057-marostegui.json
* 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22692 and previous config saved to /var/cache/conftool/dbconfig/20220316-113037-marostegui.json
* 11:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22691 and previous config saved to /var/cache/conftool/dbconfig/20220316-111647-marostegui.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22690 and previous config saved to /var/cache/conftool/dbconfig/20220316-111532-marostegui.json
* 11:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22689 and previous config saved to /var/cache/conftool/dbconfig/20220316-110411-marostegui.json
* 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22688 and previous config saved to /var/cache/conftool/dbconfig/20220316-110403-marostegui.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22687 and previous config saved to /var/cache/conftool/dbconfig/20220316-110142-marostegui.json
* 10:55 vgutierrez: rolling upgrade to HAProxy 2.4.15 on cache nodes
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22686 and previous config saved to /var/cache/conftool/dbconfig/20220316-104858-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22685 and previous config saved to /var/cache/conftool/dbconfig/20220316-104637-marostegui.json
* 10:42 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22684 and previous config saved to /var/cache/conftool/dbconfig/20220316-103353-marostegui.json
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22683 and previous config saved to /var/cache/conftool/dbconfig/20220316-101848-marostegui.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22682 and previous config saved to /var/cache/conftool/dbconfig/20220316-101729-marostegui.json
* 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 10 hosts with reason: Maintenance
* 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 10 hosts with reason: Maintenance
* 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:15 vgutierrez: rolling restart of ats-tls and ats-backend to catch up on OpenSSL updates
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22681 and previous config saved to /var/cache/conftool/dbconfig/20220316-101502-marostegui.json
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22680 and previous config saved to /var/cache/conftool/dbconfig/20220316-100527-marostegui.json
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22679 and previous config saved to /var/cache/conftool/dbconfig/20220316-100519-marostegui.json
* 10:04 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
* 10:01 moritzm: installing openssl security updates
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22678 and previous config saved to /var/cache/conftool/dbconfig/20220316-095957-marostegui.json
* 09:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
* 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS stretch
* 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22677 and previous config saved to /var/cache/conftool/dbconfig/20220316-095014-marostegui.json
* 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22676 and previous config saved to /var/cache/conftool/dbconfig/20220316-094452-marostegui.json
* 09:36 dcausse: [[phab:T293862|T293862]]: manually restarted blazegraph on wdqs1010 with "-agentpath:/usr/lib/libjvmquake.so=1000,1,0,warn=30,touch=/tmp/jvmquake"
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22675 and previous config saved to /var/cache/conftool/dbconfig/20220316-093509-marostegui.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22674 and previous config saved to /var/cache/conftool/dbconfig/20220316-092947-marostegui.json
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22673 and previous config saved to /var/cache/conftool/dbconfig/20220316-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22672 and previous config saved to /var/cache/conftool/dbconfig/20220316-092735-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22671 and previous config saved to /var/cache/conftool/dbconfig/20220316-092004-marostegui.json
* 09:16 moritzm: revert mx1001/mx2001 to the Bullseye version of Exim [[phab:T303738|T303738]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 [[phab:T303498|T303498]]', diff saved to https://phabricator.wikimedia.org/P22670 and previous config saved to /var/cache/conftool/dbconfig/20220316-091533-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22669 and previous config saved to /var/cache/conftool/dbconfig/20220316-091229-marostegui.json
* 09:09 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 08:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22668 and previous config saved to /var/cache/conftool/dbconfig/20220316-085724-marostegui.json
* 08:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:52 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22667 and previous config saved to /var/cache/conftool/dbconfig/20220316-084219-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22666 and previous config saved to /var/cache/conftool/dbconfig/20220316-084140-marostegui.json
* 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22665 and previous config saved to /var/cache/conftool/dbconfig/20220316-084127-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22664 and previous config saved to /var/cache/conftool/dbconfig/20220316-084011-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22663 and previous config saved to /var/cache/conftool/dbconfig/20220316-084003-marostegui.json
* 08:35 hashar: Restarting CI Jenkins
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22662 and previous config saved to /var/cache/conftool/dbconfig/20220316-082622-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22661 and previous config saved to /var/cache/conftool/dbconfig/20220316-082458-marostegui.json
* 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:770130{{!}}Change A/V player to videojs in the first batch of production wiki (T248418)]] (duration: 00m 49s)
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22660 and previous config saved to /var/cache/conftool/dbconfig/20220316-081117-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22659 and previous config saved to /var/cache/conftool/dbconfig/20220316-080953-marostegui.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22658 and previous config saved to /var/cache/conftool/dbconfig/20220316-075612-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22657 and previous config saved to /var/cache/conftool/dbconfig/20220316-075502-marostegui.json
* 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22656 and previous config saved to /var/cache/conftool/dbconfig/20220316-075448-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22655 and previous config saved to /var/cache/conftool/dbconfig/20220316-075248-marostegui.json
* 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22654 and previous config saved to /var/cache/conftool/dbconfig/20220316-075007-marostegui.json
* 07:49 Amir1: dbmaint on master of s4@eqiad ([[phab:T298743|T298743]])
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22653 and previous config saved to /var/cache/conftool/dbconfig/20220316-073502-marostegui.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22652 and previous config saved to /var/cache/conftool/dbconfig/20220316-071957-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22651 and previous config saved to /var/cache/conftool/dbconfig/20220316-071859-marostegui.json
* 07:18 urbanecm: UTC morning B&C window done
* 07:15 urbanecm: Create `testwiki.cx_significant_edits` and `testwiki.cx_section_translation` at s3 ([[phab:T302371|T302371]]; `mwscript sql.php --wiki=testwiki /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/<nowiki>{</nowiki>section-translations,significant-edits<nowiki>}</nowiki>.sql)`)
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|455895168ab266813ae499e8fc353c66e6d5b450}}: Disable ContentTranslation for non-extended confirmed users on viwiki ([[phab:T299636|T299636]]) (duration: 00m 51s)
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820261{{!}}Enable SectionTranslation on 10 Wikipedias where ContentTranslation is default (T308829)]] (duration: 03m 15s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22650 and previous config saved to /var/cache/conftool/dbconfig/20220316-070452-marostegui.json
* 07:06 XioNoX: add CSP headers to Netbox - [[phab:T296356|T296356]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22649 and previous config saved to /var/cache/conftool/dbconfig/20220316-070354-marostegui.json
* 07:05 elukey: restart rsyslog on ml-serve-ctrl2001
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P22648 and previous config saved to /var/cache/conftool/dbconfig/20220316-070033-marostegui.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22647 and previous config saved to /var/cache/conftool/dbconfig/20220316-065918-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22646 and previous config saved to /var/cache/conftool/dbconfig/20220316-064849-marostegui.json
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22644 and previous config saved to /var/cache/conftool/dbconfig/20220316-063344-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22643 and previous config saved to /var/cache/conftool/dbconfig/20220316-060008-marostegui.json
* 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22642 and previous config saved to /var/cache/conftool/dbconfig/20220316-055903-marostegui.json
* 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 05:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22641 and previous config saved to /var/cache/conftool/dbconfig/20220316-055805-marostegui.json
* 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 05:36 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 05:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1068.eqiad.wmnet with OS stretch
* 05:14 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org ; WCQS deploy complete
* 05:13 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS (duration: 01m 53s)
* 05:12 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.106` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 05:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS
* 05:11 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611]: 0.3.106 (duration: 06m 36s)
* 05:03 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.106` on canary `wdqs1003`; proceeding to rest of fleet
* 05:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611]: 0.3.106
* 05:01 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.106`. Pre-deploy tests passing on canary `wdqs1003`
* 02:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22640 and previous config saved to /var/cache/conftool/dbconfig/20220316-025347-marostegui.json
* 02:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22639 and previous config saved to /var/cache/conftool/dbconfig/20220316-023842-marostegui.json
* 02:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22638 and previous config saved to /var/cache/conftool/dbconfig/20220316-022336-marostegui.json
* 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22637 and previous config saved to /var/cache/conftool/dbconfig/20220316-020831-marostegui.json
* 01:43 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 01:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 01:37 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 01:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
* 00:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 00:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
* 00:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye


== 2022-03-15 ==
== 2022-08-07 ==
* 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" {{!}} mwscript purgeList.php --wiki enwiki # [[phab:T314712|T314712]]
* 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
* 22:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 22:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 22:05 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
* 22:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
* 22:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
* 22:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
* 22:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 21:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
* 21:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
* 21:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
* 21:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22635 and previous config saved to /var/cache/conftool/dbconfig/20220315-214729-marostegui.json
 
* 21:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
== 2022-08-06 ==
* 21:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22634 and previous config saved to /var/cache/conftool/dbconfig/20220315-214721-marostegui.json
* 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22633 and previous config saved to /var/cache/conftool/dbconfig/20220315-214133-ladsgroup.json
* 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
* 03:02 krinkle@deploy1002: Synchronized w/: {{Gerrit|I9067d47fab0324}} (duration: 03m 25s)
* 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22632 and previous config saved to /var/cache/conftool/dbconfig/20220315-213216-marostegui.json
* 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:27 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: [[gerrit:770935{{!}}rdbms: provide $owner argument in LoadBalancer::flushPrimarySessions() (T303885)]] (duration: 00m 53s)
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22631 and previous config saved to /var/cache/conftool/dbconfig/20220315-212628-ladsgroup.json
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22630 and previous config saved to /var/cache/conftool/dbconfig/20220315-211711-marostegui.json
* 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22629 and previous config saved to /var/cache/conftool/dbconfig/20220315-211123-ladsgroup.json
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 21:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 21:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22628 and previous config saved to /var/cache/conftool/dbconfig/20220315-210204-marostegui.json
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22627 and previous config saved to /var/cache/conftool/dbconfig/20220315-205702-marostegui.json
* 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22626 and previous config saved to /var/cache/conftool/dbconfig/20220315-205618-ladsgroup.json
* 20:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22625 and previous config saved to /var/cache/conftool/dbconfig/20220315-204912-marostegui.json
* 20:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22624 and previous config saved to /var/cache/conftool/dbconfig/20220315-204157-marostegui.json
* 20:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22623 and previous config saved to /var/cache/conftool/dbconfig/20220315-203407-marostegui.json
* 20:27 bd808: Toolhub: running post-deploy database migrations
* 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22622 and previous config saved to /var/cache/conftool/dbconfig/20220315-202652-marostegui.json
* 20:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 20:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22621 and previous config saved to /var/cache/conftool/dbconfig/20220315-201902-marostegui.json
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771011{{!}}GrowthExperiments: Add another entry to GECampaignPatterns (T302738)]] (duration: 02m 22s)
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22620 and previous config saved to /var/cache/conftool/dbconfig/20220315-201147-marostegui.json
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22619 and previous config saved to /var/cache/conftool/dbconfig/20220315-200357-marostegui.json
* 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22618 and previous config saved to /var/cache/conftool/dbconfig/20220315-195934-ladsgroup.json
* 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22617 and previous config saved to /var/cache/conftool/dbconfig/20220315-195657-ladsgroup.json
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22616 and previous config saved to /var/cache/conftool/dbconfig/20220315-194152-ladsgroup.json
* 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22615 and previous config saved to /var/cache/conftool/dbconfig/20220315-193029-marostegui.json
* 19:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22614 and previous config saved to /var/cache/conftool/dbconfig/20220315-192647-ladsgroup.json
* 19:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 19:22 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 19:19 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 19:18 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22612 and previous config saved to /var/cache/conftool/dbconfig/20220315-191234-marostegui.json
* 19:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22611 and previous config saved to /var/cache/conftool/dbconfig/20220315-191226-marostegui.json
* 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22610 and previous config saved to /var/cache/conftool/dbconfig/20220315-191140-ladsgroup.json
* 19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 19:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22609 and previous config saved to /var/cache/conftool/dbconfig/20220315-185721-marostegui.json
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22608 and previous config saved to /var/cache/conftool/dbconfig/20220315-185413-ladsgroup.json
* 18:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22607 and previous config saved to /var/cache/conftool/dbconfig/20220315-185405-ladsgroup.json
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS stretch
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
* 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22606 and previous config saved to /var/cache/conftool/dbconfig/20220315-184216-marostegui.json
* 18:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22605 and previous config saved to /var/cache/conftool/dbconfig/20220315-183900-ladsgroup.json
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS buster
* 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS buster
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22604 and previous config saved to /var/cache/conftool/dbconfig/20220315-182711-marostegui.json
* 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22603 and previous config saved to /var/cache/conftool/dbconfig/20220315-182355-ladsgroup.json
* 18:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS buster
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22602 and previous config saved to /var/cache/conftool/dbconfig/20220315-180850-ladsgroup.json
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS buster
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22601 and previous config saved to /var/cache/conftool/dbconfig/20220315-180542-marostegui.json
* 18:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 17:57 XioNoX: power down mr1-ulsfo for replacement
* 17:52 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22600 and previous config saved to /var/cache/conftool/dbconfig/20220315-175143-marostegui.json
* 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22599 and previous config saved to /var/cache/conftool/dbconfig/20220315-175130-marostegui.json
* 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22598 and previous config saved to /var/cache/conftool/dbconfig/20220315-175037-marostegui.json
* 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22597 and previous config saved to /var/cache/conftool/dbconfig/20220315-173625-marostegui.json
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22596 and previous config saved to /var/cache/conftool/dbconfig/20220315-173532-marostegui.json
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22595 and previous config saved to /var/cache/conftool/dbconfig/20220315-172616-ladsgroup.json
* 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22594 and previous config saved to /var/cache/conftool/dbconfig/20220315-172608-ladsgroup.json
* 17:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:25 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22593 and previous config saved to /var/cache/conftool/dbconfig/20220315-172119-marostegui.json
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22592 and previous config saved to /var/cache/conftool/dbconfig/20220315-172027-marostegui.json
* 17:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22591 and previous config saved to /var/cache/conftool/dbconfig/20220315-171103-ladsgroup.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22590 and previous config saved to /var/cache/conftool/dbconfig/20220315-170614-marostegui.json
* 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22589 and previous config saved to /var/cache/conftool/dbconfig/20220315-170201-marostegui.json
* 17:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:01 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.24 (duration: 01m 32s)
* 16:59 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]] (duration: 38m 54s)
* 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22588 and previous config saved to /var/cache/conftool/dbconfig/20220315-165558-ladsgroup.json
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22587 and previous config saved to /var/cache/conftool/dbconfig/20220315-164751-marostegui.json
* 16:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22586 and previous config saved to /var/cache/conftool/dbconfig/20220315-164743-marostegui.json
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22585 and previous config saved to /var/cache/conftool/dbconfig/20220315-164053-ladsgroup.json
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22584 and previous config saved to /var/cache/conftool/dbconfig/20220315-163626-ladsgroup.json
* 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22583 and previous config saved to /var/cache/conftool/dbconfig/20220315-163618-ladsgroup.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22582 and previous config saved to /var/cache/conftool/dbconfig/20220315-163238-marostegui.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22581 and previous config saved to /var/cache/conftool/dbconfig/20220315-163134-marostegui.json
* 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22580 and previous config saved to /var/cache/conftool/dbconfig/20220315-163126-marostegui.json
* 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22579 and previous config saved to /var/cache/conftool/dbconfig/20220315-162113-ladsgroup.json
* 16:20 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22578 and previous config saved to /var/cache/conftool/dbconfig/20220315-161732-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22577 and previous config saved to /var/cache/conftool/dbconfig/20220315-161621-marostegui.json
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22576 and previous config saved to /var/cache/conftool/dbconfig/20220315-160607-ladsgroup.json
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22575 and previous config saved to /var/cache/conftool/dbconfig/20220315-160226-marostegui.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22574 and previous config saved to /var/cache/conftool/dbconfig/20220315-160116-marostegui.json
* 15:53 moritzm: updating Exim on mx1001 [[phab:T303738|T303738]]
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22573 and previous config saved to /var/cache/conftool/dbconfig/20220315-155102-ladsgroup.json
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22572 and previous config saved to /var/cache/conftool/dbconfig/20220315-154639-ladsgroup.json
* 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22571 and previous config saved to /var/cache/conftool/dbconfig/20220315-154631-ladsgroup.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22570 and previous config saved to /var/cache/conftool/dbconfig/20220315-154610-marostegui.json
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22569 and previous config saved to /var/cache/conftool/dbconfig/20220315-153126-ladsgroup.json
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22568 and previous config saved to /var/cache/conftool/dbconfig/20220315-152916-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 15:18 moritzm: installing Java updates on wcqs*/wdqs* hosts
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22567 and previous config saved to /var/cache/conftool/dbconfig/20220315-151621-ladsgroup.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22566 and previous config saved to /var/cache/conftool/dbconfig/20220315-151206-marostegui.json
* 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:09 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f01214c]: (no justification provided) (duration: 00m 07s)
* 15:09 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f01214c]: (no justification provided)
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22565 and previous config saved to /var/cache/conftool/dbconfig/20220315-150649-root.json
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22564 and previous config saved to /var/cache/conftool/dbconfig/20220315-150116-ladsgroup.json
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22563 and previous config saved to /var/cache/conftool/dbconfig/20220315-145246-ladsgroup.json
* 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22562 and previous config saved to /var/cache/conftool/dbconfig/20220315-145238-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22561 and previous config saved to /var/cache/conftool/dbconfig/20220315-145146-root.json
* 14:50 moritzm: installing postgresql-11 security updates
* 14:49 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@88d5618]: (no justification provided) (duration: 00m 07s)
* 14:49 ntsako@deploy1002: Started deploy [airflow-dags/analytics@88d5618]: (no justification provided)
* 14:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:42 ottomata: I read the cumin output wrong, kafka-jumbo1001 and 1002 restarted successfully before accidental ctrl-c on cumin command.  Restarting the full jumbo roll-restart to thoroughly do them all - [[phab:T303324|T303324]]
* 14:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
* 14:39 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:38 ottomata: all brokers except kafka-jumbo1001 were succesffully roll restarted, doing kafka-jumbo1001 manually - [[phab:T303324|T303324]]
* 14:37 ottomata: accidental cancel of roll restart brokers, re-doing - [[phab:T303324|T303324]]
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22560 and previous config saved to /var/cache/conftool/dbconfig/20220315-143733-ladsgroup.json
* 14:37 otto@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22559 and previous config saved to /var/cache/conftool/dbconfig/20220315-143642-root.json
* 14:32 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@2924232]: (no justification provided) (duration: 00m 08s)
* 14:32 ntsako@deploy1002: Started deploy [airflow-dags/analytics@2924232]: (no justification provided)
* 14:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1068.eqiad.wmnet with OS stretch
* 14:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:22 inflatador: [[phab:T303256|T303256]] bking@cumin1001 restarting wdqs services `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-blazegraph`
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22558 and previous config saved to /var/cache/conftool/dbconfig/20220315-142228-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22557 and previous config saved to /var/cache/conftool/dbconfig/20220315-142138-root.json
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22556 and previous config saved to /var/cache/conftool/dbconfig/20220315-140723-ladsgroup.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22555 and previous config saved to /var/cache/conftool/dbconfig/20220315-140634-root.json
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22554 and previous config saved to /var/cache/conftool/dbconfig/20220315-140520-marostegui.json
* 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22553 and previous config saved to /var/cache/conftool/dbconfig/20220315-140259-ladsgroup.json
* 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22552 and previous config saved to /var/cache/conftool/dbconfig/20220315-140252-ladsgroup.json
* 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:00 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:59 ottomata: roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - [[phab:T303324|T303324]]
* 13:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 13:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22551 and previous config saved to /var/cache/conftool/dbconfig/20220315-135015-marostegui.json
* 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22550 and previous config saved to /var/cache/conftool/dbconfig/20220315-134747-ladsgroup.json
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 awight: EU deployment complete
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22549 and previous config saved to /var/cache/conftool/dbconfig/20220315-133510-marostegui.json
* 13:34 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:770907{{!}}[beta] Disable improved template search (T286991, T302857)]] (take 2) (duration: 00m 50s)
* 13:32 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:770907{{!}}[beta] Disable improved template search (T286991, T302857)]] (duration: 00m 48s)
* 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22548 and previous config saved to /var/cache/conftool/dbconfig/20220315-133241-ladsgroup.json
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 awight@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:770906{{!}}[beta] Remove unused config overrides]] (duration: 00m 49s)
* 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22547 and previous config saved to /var/cache/conftool/dbconfig/20220315-132857-marostegui.json
* 13:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22546 and previous config saved to /var/cache/conftool/dbconfig/20220315-132005-marostegui.json
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22545 and previous config saved to /var/cache/conftool/dbconfig/20220315-131736-ladsgroup.json
* 13:15 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/TemplateWizard/resources/ext.TemplateWizard.SearchField.js: Backport: [[gerrit:770057{{!}}Fix copy-paste mistake in template search widget (T303524)]] (duration: 00m 49s)
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22544 and previous config saved to /var/cache/conftool/dbconfig/20220315-131436-marostegui.json
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22543 and previous config saved to /var/cache/conftool/dbconfig/20220315-131352-marostegui.json
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22542 and previous config saved to /var/cache/conftool/dbconfig/20220315-131311-ladsgroup.json
* 13:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22541 and previous config saved to /var/cache/conftool/dbconfig/20220315-131303-ladsgroup.json
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22540 and previous config saved to /var/cache/conftool/dbconfig/20220315-130936-marostegui.json
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:07 Amir1: removed 440 more corrupt rows in flaggedtemplates in dewiki ([[phab:T297189|T297189]])
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22539 and previous config saved to /var/cache/conftool/dbconfig/20220315-125847-marostegui.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22538 and previous config saved to /var/cache/conftool/dbconfig/20220315-125758-ladsgroup.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22537 and previous config saved to /var/cache/conftool/dbconfig/20220315-125431-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22536 and previous config saved to /var/cache/conftool/dbconfig/20220315-125228-marostegui.json
* 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 12:48 Amir1: removed 170 corrupt rows in flaggedtemplates in dewiki ([[phab:T297189|T297189]])
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22535 and previous config saved to /var/cache/conftool/dbconfig/20220315-124342-marostegui.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22534 and previous config saved to /var/cache/conftool/dbconfig/20220315-124253-ladsgroup.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22533 and previous config saved to /var/cache/conftool/dbconfig/20220315-123926-marostegui.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22532 and previous config saved to /var/cache/conftool/dbconfig/20220315-122748-ladsgroup.json
* 12:24 moritzm: updating Exim on mx2001 [[phab:T303738|T303738]]
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22531 and previous config saved to /var/cache/conftool/dbconfig/20220315-122421-marostegui.json
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22530 and previous config saved to /var/cache/conftool/dbconfig/20220315-121317-ladsgroup.json
* 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22529 and previous config saved to /var/cache/conftool/dbconfig/20220315-121309-ladsgroup.json
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22528 and previous config saved to /var/cache/conftool/dbconfig/20220315-115804-ladsgroup.json
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22527 and previous config saved to /var/cache/conftool/dbconfig/20220315-114259-ladsgroup.json
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22526 and previous config saved to /var/cache/conftool/dbconfig/20220315-112754-ladsgroup.json
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22525 and previous config saved to /var/cache/conftool/dbconfig/20220315-112308-ladsgroup.json
* 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Maintenance
* 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Maintenance
* 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22524 and previous config saved to /var/cache/conftool/dbconfig/20220315-110423-marostegui.json
* 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22523 and previous config saved to /var/cache/conftool/dbconfig/20220315-110416-marostegui.json
* 10:50 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22522 and previous config saved to /var/cache/conftool/dbconfig/20220315-104910-marostegui.json
* 10:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22521 and previous config saved to /var/cache/conftool/dbconfig/20220315-103405-marostegui.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22520 and previous config saved to /var/cache/conftool/dbconfig/20220315-101922-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22519 and previous config saved to /var/cache/conftool/dbconfig/20220315-101900-marostegui.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22518 and previous config saved to /var/cache/conftool/dbconfig/20220315-101449-root.json
* 10:13 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 --oldimage ([[phab:T226311|T226311]])
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22517 and previous config saved to /var/cache/conftool/dbconfig/20220315-100418-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22516 and previous config saved to /var/cache/conftool/dbconfig/20220315-095945-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22515 and previous config saved to /var/cache/conftool/dbconfig/20220315-094914-root.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22514 and previous config saved to /var/cache/conftool/dbconfig/20220315-094441-root.json
* 09:38 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22513 and previous config saved to /var/cache/conftool/dbconfig/20220315-093410-root.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22512 and previous config saved to /var/cache/conftool/dbconfig/20220315-092937-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22511 and previous config saved to /var/cache/conftool/dbconfig/20220315-091906-root.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22510 and previous config saved to /var/cache/conftool/dbconfig/20220315-091850-root.json
* 09:14 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22509 and previous config saved to /var/cache/conftool/dbconfig/20220315-091433-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22507 and previous config saved to /var/cache/conftool/dbconfig/20220315-090346-root.json
* 09:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22506 and previous config saved to /var/cache/conftool/dbconfig/20220315-085929-root.json
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:57 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env (duration: 01m 55s)
* 08:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env
* 08:53 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env (duration: 03m 22s)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22505 and previous config saved to /var/cache/conftool/dbconfig/20220315-085214-marostegui.json
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22504 and previous config saved to /var/cache/conftool/dbconfig/20220315-085206-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22503 and previous config saved to /var/cache/conftool/dbconfig/20220315-085026-marostegui.json
* 08:50 marostegui: dbmaint on s5@eqiad [[phab:T297189|T297189]]
* 08:49 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22502 and previous config saved to /var/cache/conftool/dbconfig/20220315-084842-root.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22501 and previous config saved to /var/cache/conftool/dbconfig/20220315-084425-root.json
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1161.eqiad.wmnet with OS bullseye
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22500 and previous config saved to /var/cache/conftool/dbconfig/20220315-083925-marostegui.json
* 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22499 and previous config saved to /var/cache/conftool/dbconfig/20220315-083917-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22498 and previous config saved to /var/cache/conftool/dbconfig/20220315-083701-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22497 and previous config saved to /var/cache/conftool/dbconfig/20220315-083338-root.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
* 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22496 and previous config saved to /var/cache/conftool/dbconfig/20220315-082412-marostegui.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22495 and previous config saved to /var/cache/conftool/dbconfig/20220315-082401-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22493 and previous config saved to /var/cache/conftool/dbconfig/20220315-082157-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22492 and previous config saved to /var/cache/conftool/dbconfig/20220315-081835-root.json
* 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1161.eqiad.wmnet with OS bullseye
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22491 and previous config saved to /var/cache/conftool/dbconfig/20220315-080907-marostegui.json
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22490 and previous config saved to /var/cache/conftool/dbconfig/20220315-080857-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22489 and previous config saved to /var/cache/conftool/dbconfig/20220315-080651-marostegui.json
* 08:05 marostegui: dbmaint on s5@eqiad [[phab:T300473|T300473]]
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22488 and previous config saved to /var/cache/conftool/dbconfig/20220315-080329-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P22487 and previous config saved to /var/cache/conftool/dbconfig/20220315-080128-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22486 and previous config saved to /var/cache/conftool/dbconfig/20220315-075402-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22485 and previous config saved to /var/cache/conftool/dbconfig/20220315-075353-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22484 and previous config saved to /var/cache/conftool/dbconfig/20220315-074825-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22483 and previous config saved to /var/cache/conftool/dbconfig/20220315-074650-root.json
* 07:43 elukey: restart kube-api server on ml-serve-ctrl2002 - 504 responses registered, corresponding to high custom resource definition requests
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22482 and previous config saved to /var/cache/conftool/dbconfig/20220315-073849-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22481 and previous config saved to /var/cache/conftool/dbconfig/20220315-073146-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22480 and previous config saved to /var/cache/conftool/dbconfig/20220315-072345-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22479 and previous config saved to /var/cache/conftool/dbconfig/20220315-071642-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22478 and previous config saved to /var/cache/conftool/dbconfig/20220315-070841-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22477 and previous config saved to /var/cache/conftool/dbconfig/20220315-070635-marostegui.json
* 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22476 and previous config saved to /var/cache/conftool/dbconfig/20220315-070138-root.json
* 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1166.eqiad.wmnet with OS bullseye
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22475 and previous config saved to /var/cache/conftool/dbconfig/20220315-065337-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22474 and previous config saved to /var/cache/conftool/dbconfig/20220315-064634-root.json
* 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
* 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22473 and previous config saved to /var/cache/conftool/dbconfig/20220315-063130-root.json
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1166.eqiad.wmnet with OS bullseye
* 06:26 marostegui: dbmaint on s3@eqiad [[phab:T300600|T300600]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P22472 and previous config saved to /var/cache/conftool/dbconfig/20220315-062543-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22471 and previous config saved to /var/cache/conftool/dbconfig/20220315-061626-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22470 and previous config saved to /var/cache/conftool/dbconfig/20220315-061458-marostegui.json
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22469 and previous config saved to /var/cache/conftool/dbconfig/20220315-061450-marostegui.json
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22468 and previous config saved to /var/cache/conftool/dbconfig/20220315-055945-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22467 and previous config saved to /var/cache/conftool/dbconfig/20220315-054440-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22466 and previous config saved to /var/cache/conftool/dbconfig/20220315-052935-marostegui.json
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22465 and previous config saved to /var/cache/conftool/dbconfig/20220315-013013-marostegui.json
* 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22464 and previous config saved to /var/cache/conftool/dbconfig/20220315-013000-marostegui.json
* 01:26 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/maintenance/populateGlobalEditCount.php: fix script bug gerrit 770058 (duration: 00m 50s)
* 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22463 and previous config saved to /var/cache/conftool/dbconfig/20220315-011455-marostegui.json
* 00:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22462 and previous config saved to /var/cache/conftool/dbconfig/20220315-005950-marostegui.json
* 00:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22461 and previous config saved to /var/cache/conftool/dbconfig/20220315-004445-marostegui.json
* 00:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)


== 2022-03-14 ==
== 2022-08-05 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22460 and previous config saved to /var/cache/conftool/dbconfig/20220314-234430-marostegui.json
* 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
* 23:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
* 23:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
* 23:28 mwdebug-deploy
* 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
* 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|3fa9683}}: Delete huwiki 500k milestone logo files ([[phab:T301923|T301923]]) (duration: 00m 49s)
* 13:17 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820441{{!}}Remove unused CA P3P config]] (duration: 03m 09s)
* 13:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|3c2c8b0cca4e48f572abd3812594097a33e64379}}: Stop using huwiki 500k milestone logo ([[phab:T301923|T301923]]) (duration: 00m 48s)
* 13:14 jbond: intorudce new puppetmaster backends puppetmaster[12]004
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22419 and previous config saved to /var/cache/conftool/dbconfig/20220314-131344-marostegui.json
* 13:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22418 and previous config saved to /var/cache/conftool/dbconfig/20220314-131220-marostegui.json
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:819175{{!}}QuickSurveys: Deploy research incentive survey to Bengali wiki (T314333)]] (duration: 03m 26s)
* 13:10 urbanecm@deploy1002: Synchronized wmf-config/wikitech.php: {{Gerrit|95f376a}}: wikitech: migrate wmf* to wmg* ([[phab:T45956|T45956]]) (duration: 00m 48s)
* 13:07 moritzm: installing jetty9 security updates
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22417 and previous config saved to /var/cache/conftool/dbconfig/20220314-125839-marostegui.json
* 12:48 moritzm: installing Linux 4.19.249 kernels on Buster hosts
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22416 and previous config saved to /var/cache/conftool/dbconfig/20220314-124911-marostegui.json
* 12:03 jbond: send sretest100[12] and idp-test2001 to the new puppetmaster[12]004 servers to
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22415 and previous config saved to /var/cache/conftool/dbconfig/20220314-124902-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22414 and previous config saved to /var/cache/conftool/dbconfig/20220314-123357-marostegui.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22413 and previous config saved to /var/cache/conftool/dbconfig/20220314-121937-marostegui.json
* 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22412 and previous config saved to /var/cache/conftool/dbconfig/20220314-121852-marostegui.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22411 and previous config saved to /var/cache/conftool/dbconfig/20220314-120347-marostegui.json
* 11:55 moritzm: restarting nginx on archiva1002 to pick up security updates
* 11:53 moritzm: restarting apache2 on matomo1002 to pick up security updates
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22410 and previous config saved to /var/cache/conftool/dbconfig/20220314-114312-marostegui.json
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298294


== 2022-03-08 ==
== 2022-08-03 ==
* 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
* 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
* 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
* 23
* 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
* 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.json
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22155 and previous config saved to /var/cache/conftool/dbconfig/20220308-225413-marostegui.json
* 22:38 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22153 and previous config saved to /var/cache/conftool/dbconfig/20220308-223838-marostegui.json
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:24 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 22:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22152 and previous config saved to /var/cache/conftool/dbconfig/20220308-222303-marostegui.json
* 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22151 and previous config saved to /var/cache/conftool/dbconfig/20220308-222055-marostegui.json
* 22:20 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 22:20 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22150 and previous config saved to /var/cache/conftool/dbconfig/20220308-222016-marostegui.json
* 22:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22149 and previous config saved to /var/cache/conftool/dbconfig/20220308-220441-marostegui.json
* 21:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22148 and previous config saved to /var/cache/conftool
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]) (duration: 03m 25s)
* 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
* 20:39 urbanecm@deploy1002: sync-file aborted: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]ú (duration: 00m 00s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: {{Gerrit|b840eef86837aed3e566885110e93b2ca9ab5f42}}: Fix ReplyLinksController#teardown (duration: 03m 27s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: {{Gerrit|70a18f5846111a0dfe8ba473daf384cbb8e88804}}:  Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 13s)
* 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: {{Gerrit|9961e9bc8f5873f8ddc8a11108de0a7bfcb14ae6}}: Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 23s)
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
* 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to all wikis
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: [[gerrit:767812{{!}}Unset data-toc in SkinVector (T302461)]] (duration: 00m 49s)
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
 
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
* 20:21 marostegui@cumin1001:


== 2022-03-02 ==
== 2022-08-02 ==
* 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:25 ryankemper: [[phab:T276198|T276198]] Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from [[phab:T276198|T276198]]"'`
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site  /home) again after gerrit2002 was reimaged with buster [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 23:21 ryankemper: [[phab:T276198|T276198]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
* 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
* 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 23:08 robh
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START


== 2022-03-01 ==
== 2022-08-01 ==
* 22:51 inflatador: [[phab:T276198|T276198]] reenabled puppet on elastic1052.eqiad.wmnet
* 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Id1ce285631f5}}, {{Gerrit|I194d419fbfe}} (duration: 03m 09s)
* 22:37 inflatador: [[phab:T276198|T276198]] rebooting elastic1052.eqiad.wmnet to test failure condition
* 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:32 inflatador: [[phab:T276198|T276198]] disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
* 21:08 moritzm: drain ganeti2028 [[phab:T309957|T309957]]
* 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site {{!}} gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 21:01 urbanecm: UTC late backport window done
* 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|461e0709a8987b110f669b74afc38c706b616e5d}}: itwiki: Change robot policy on NS2 and NS3 ([[phab:T314165|T314165]]) (duration: 03m 18s)
* 20:33 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: [[gerrit:767089{{!}}Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643)]] (duration: 00m 55s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:05 mutante: alert1001 - re-enabled puppet
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs [[phab:T300200|T300200]] (duration: 53m 17s)
* 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) [[phab:T313360|T313360]]
* 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:38 mutante: mw1449 - scap pull
* 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary --fix # [[phab:T314023|T314023]]
* 19:36 mutante: mw1414 - scap pull
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba8c17759b7e737a6757792ad4136ff3af00030c}}: mnwwiktionary: Create Appendix namespace ([[phab:T314023|T314023]]) (duration: 03m 09s)
* 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:57 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): there's currently a single blocker at [[phab:T302643|T302643]]; staging to testwikis and holding there until backport's available
* 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # [[phab:T314239|T314239]]
* 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
* 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c19c3e36ab}}: DiscussionTools: Make new reply buttons available at mediawiki.org ([[phab:T314076|T314076]]); {{Gerrit|24db016c4}}: viwikibooks: Change wgArticleCountMethod to any ([[phab:T314239|T314239]]) (duration: 03m 10s)
* 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: [[gerrit:819129{{!}}Parsoid REST handler: allow pagebundle input without original HTML.]] (duration: 03m 15s)
* 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg ([[phab:T311700|T311700]])
* 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
* 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 17s)
* 17:52 cwhite: completed grafana upgrade in eqiad [[phab:T282863|T282863]]
* 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 32s)
* 17:50 herron: re-enabling puppet and ircecho on alert1001
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:47 cwhite: upgrade grafana in eqiad [[phab:T282863|T282863]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
* 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
* 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
* 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
* 16:55 dancy@deploy1002: Started scap: testing container image build
* 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
* 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
* 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
* 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
* 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
* 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]] (duration: 36m 13s)
* 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
* 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
* 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]]
* 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
* 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
* 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
* 17:18 ryankemper: [[phab:T289135|T289135]] [[phab:T314078|T314078]] Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
* 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
* 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
* 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again ([[phab:T196336|T196336]])
* 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
* 16:25 moritzm: installing tcpdump updates from bullseye point release
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
* 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
* 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
* 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
* 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (2/2, should be a no-op) (duration: 03m 30s)
* 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (1/2, should be a no-op) (duration: 03m 15s)
* 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:42 moritzm: installing openjdk-11 security updates
* 13:53 mmandere: restart purged on cp60[15-16]
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:40 kormat: Deploying wmfmariadbpy 0.9 [[phab:T302796|T302796]]
* 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o [[phab:T302796|T302796]]
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: {{Gerrit|b5007c5f1c389deb344c5bb99e950b4190436cab}}: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
* 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 XioNoX: restart cr1-drmrs for software upgrade
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
* 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
* 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 2/2) (duration: 03m 17s)
* 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
* 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource<nowiki>{</nowiki>.png;-1.5x.png;-2x.png<nowiki>}</nowiki> ([[phab:T310961|T310961]])
* 12:39 moritzm: installing expat security updates
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:34 mmandere: restart purged on cp60[12-14]
* 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: srwikisource: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 1/2) (duration: 03m 41s)
* 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
* 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
* 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
* 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
* 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
* 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
* 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
* 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:50 moritzm: installing openjdk-8 security updates for stretch
* 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
* 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
* 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - [[phab:T309651|T309651]]
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - [[phab:T309651|T309651]]
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
* 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
* 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
* 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
* 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.[[phab:T244843|T244843]]
* 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
* 11:18 _joe_: also removed the ipvsadm entry for apaches:80 [[phab:T244843|T244843]]
* 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
* 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - [[phab:T302744|T302744]]
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
* 11:17 _joe_: restarting pybal on lvs1020 [[phab:T244843|T244843]]
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:11 _joe_: restarted pybal on lvs2009, [[phab:T244843|T244843]]
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:07 _joe_: restarted pybal on lvs2010, [[phab:T244843|T244843]]
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
* 11:02 mmandere: restart purged on cp60[09,10,11]
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
* 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
* 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
* 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
* 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
* 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
* 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
* 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
* 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
* 10:31 mmandere: restart purged on cp600[6-8]
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
* 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: [[gerrit:818562{{!}}api: Support for links migration in ApiQueryBacklinks (T312865 T314112)]] (duration: 03m 01s)
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
* 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
* 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:31 _joe_: restart pybal on lvs1020
* 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
* 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
* 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
* 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - [[phab:T314275|T314275]]
* 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
* 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818998{{!}}Stop writing to the old templatelinks columns in itwikisource (T312865)]] (duration: 03m 12s)
* 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
* 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
* 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw [[phab:T287820|T287820]]
* 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:20 _joe_: restarted pybal on lvs2010
* 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 09:06 elukey: restart purged on cp6005
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 08:57 elukey: restart purged on cp6004
* 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
* 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
* 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
* 08:27 urbanecm: UTC morning B&C window done
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 08:25 elukey: restart purged on cp6003
* 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
* 08:16 moritzm: drain instances off ganeti2008 for eventual decom
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
* 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: {{Gerrit|d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159}}: Use service-proxy to connect to linkrecommendation ([[phab:T302719|T302719]]) (duration: 00m 49s)
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver{{!}}api)-ro
* 07:59 elukey: restart purged on cp6002
* 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
* 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test (duration: 00m 17s)
* 01:00 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ic0dbcba9f60f20a}} (duration: 03m 31s)
* 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
* 00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki> [[phab:T302464|T302464]]
* 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki>
* 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
* 00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|I9d363abd7cfef}} (duration: 03m 17s)
* 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
* 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
==Archives ==
* 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' {{!}} grep grandpa  =>  "&ldquo;Wikipedia is like an all-knowing grandpa.&rdquo;" {{!}} [[phab:T300171|T300171]]
 
==Archives==
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].
<noinclude>
<noinclude>

Latest revision as of 13:37, 13 August 2022

2022-08-13

  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32379 and previous config saved to /var/cache/conftool/dbconfig/20220813-133713-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32378 and previous config saved to /var/cache/conftool/dbconfig/20220813-132207-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32377 and previous config saved to /var/cache/conftool/dbconfig/20220813-130701-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32376 and previous config saved to /var/cache/conftool/dbconfig/20220813-125156-ladsgroup.json

2022-08-12

  • 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg T315121
  • 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer T315121
  • 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
  • 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
  • 21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
  • 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
  • 20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
  • 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
  • 20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
  • 20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
  • 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
  • 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
  • 19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
  • 19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
  • 19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
  • 18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
  • 18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
  • 18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
  • 18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
  • 18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
  • 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
  • 17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
  • 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
  • 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
  • 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
  • 17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
  • 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
  • 16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
  • 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
  • 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
  • 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
  • 15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
  • 15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
  • 15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
  • 14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
  • 14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
  • 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
  • 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
  • 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
  • 14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
  • 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
  • 14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
  • 13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
  • 13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
  • 05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json

2022-08-11

  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:04 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: revert Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 15s)
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:47 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 07s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.DesktopArticleTarget.init.js: Backport: Do not show incompatible skin warning when page is not editable (T314952) (duration: 03m 16s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 mutante: merging change on prod phabricator host to allow scap deployment, part 1
  • 19:42 damilare: payments-wiki upgraded from cf5e1848 to 0894d75a
  • 19:41 mutante: disabling puppet on C:profile::phabricator::main
  • 19:20 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:58 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Fix labtestwiki database name servers (T310795) (duration: 03m 39s)
  • 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:52 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T309651
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:46 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T3096515
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 sukhe: testing ATS 9.1.3-1wm1 on cp1090: T309651
  • 17:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host netmon2002
  • 17:34 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host netmon2002
  • 17:33 sukhe: testing ATS 9.1.3-1wm1 on cp3065: T309651
  • 17:28 sukhe: testing ATS 9.1.3-1wm1 on cp1089: T309651
  • 17:19 bking@cumin1001: conftool action : set/weight=10:pooled=no; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
  • 17:18 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
  • 17:15 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
  • 16:35 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 16:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
  • 16:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
  • 16:26 inflatador: bking@elastic1054 attempting to ban elastic1100-1102 from cluster due to firewall issues
  • 16:13 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
  • 16:12 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic1100
  • 15:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32364 and previous config saved to /var/cache/conftool/dbconfig/20220811-145823-ladsgroup.json
  • 14:55 inflatador: bking@cumin1001 running puppet agent across eqiad elastic hosts
  • 14:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32362 and previous config saved to /var/cache/conftool/dbconfig/20220811-144318-ladsgroup.json
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32361 and previous config saved to /var/cache/conftool/dbconfig/20220811-142813-ladsgroup.json
  • 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1003.wikimedia.org
  • 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:19 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1003.wikimedia.org
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1004.wikimedia.org
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: