You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye)
imported>Stashbot
(ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet)
 
(288 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2022-03-24 ==
== 2023-01-29 ==
* 00:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
* 00:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
* 00:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
* 00:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
* 00:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 00:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
* 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage


== 2022-03-23 ==
== 2023-01-28 ==
* 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
* 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye
* 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 23:34 brennen: trainsperiment ([[phab:T300203|T300203]]): reverting to 1.39.0-wmf.3 on all wikis for [[phab:T304564|T304564]]; will move forward again after a fix.
* 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes [[phab:T301770|T301770]]
* 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773331{{!}}Enable split A/B testing on beta cluster (T301584)]] (duration: 00m 50s)
* 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:772408{{!}}Allow autoconfirmed users to view basic IP information (T303858)]] and [[gerrit:767216{{!}}Enable IPInfo on testwiki (T260598)]] (duration: 00m 50s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:771448{{!}}DynamicSidebar: remove unused extension (T304006)]] (duration: 00m 49s)
* 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771447{{!}}DynamicSidebar: remove from InitialiseSettings]] (duration: 00m 51s)
* 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771444{{!}}DynamicSidebar: remove from CommonSettings (T304006)]] (duration: 00m 50s)
* 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771443{{!}}wikitech: Remove DynamicSidebar (T304006)]] (duration: 00m 52s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:37 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:47 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
* 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 49m 41s)
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 17:48 brennen: trainsperiment ([[phab:T300203|T300203]]): starting prep for 1.39.0-wmf.4
* 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # [[phab:T304052|T304052]]
* 15:38 urbanecm: Created shnwikivoyage and guwwiki
* 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
* 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 04s)
* 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 07s)
* 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 06s)
* 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki ([[phab:T303727|T303727]])
* 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 10s)
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage ([[phab:T302797|T302797]])
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
* 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: {{Gerrit|9a0aed0}}: addWiki: Create GrowthExperiment tables for all new Wikipedias ([[phab:T304052|T304052]]) (duration: 01m 06s)
* 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
* 14:37 mmandere: depool cp1080 for reimage - [[phab:T290005|T290005]]
* 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:23 bblack: reboot cp1085 (downtimed)
* 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
* 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:48 Lucas_WMDE: UTC afternoon backport window done
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (2/2, production no-op) (duration: 01m 05s)
* 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (1/2, production no-op) (duration: 01m 07s)
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768090{{!}}Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients]] (duration: 01m 10s)
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:38 moritzm: restarting superset for OpenSSL update
* 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
* 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
* 13:07 mmandere: depool cp1082 for reimage - [[phab:T290005|T290005]]
* 12:58 moritzm: installing bind security updates
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
* 12:29 moritzm: restarting Turnilo for OpenSSL update
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
* 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
* 11:33 moritzm: installing apache security updates on stretch
* 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
* 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
* 10:30 moritzm: restarting ntpd
* 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
* 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
* 09:56 mmandere: depool cp1081 for reimage - [[phab:T290005|T290005]]
* 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
* 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
* 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
* 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:43 moritzm: installing openssl security updates
* 08:36 mmandere: depool cp1079 for reimage - [[phab:T290005|T290005]]
* 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
* 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
* 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:20 ejegg: updated payments-wiki from {{Gerrit|3048f0aa}} to {{Gerrit|28e24856}}
* 00:11 cjming: end running skin preference update script [[phab:T299104|T299104]]


== 2022-03-22 ==
== 2023-01-27 ==
* 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
* 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
* 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
* 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
* 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
* 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:
* 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:27 pt1979@cumin1001


== 2022-03-21 ==
== 2023-01-26 ==
* 23:52 eileen: civicrm revision changed from {{Gerrit|52c45874}} to {{Gerrit|30c55f51}}
* 23:59 zabe@deploy1002: Finished scap: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]] (duration: 34m 42s)
* 22:29 ryankemper: [[phab:T301955|T301955]] Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
* 23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
* 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
* 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
* 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
* 21:59 ryankemper: [[phab:T301955|T301955]] Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
* 23:26 zabe@deploy1002: zabe and superpes: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
*
* 23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
* 23:24 zabe@deploy1002: Started scap: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]]
* 23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T326691|T326691]] - remove mitigation and monitor (duration: 06m 52s)
* 23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
* 23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
* 23:03 zabe@deploy1002: Finished scap: Backport for [[gerrit:881390{{!}


== 2022-03-20 ==
== 2023-01-25 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
* 23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
* 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
* 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
* 23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
* 23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
* 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
* 23:21 zabe@deploy1002: Started scap: (no justification provided)
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
* 23:20 zabe@deploy1002: Backport cancelled.
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
* 23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
* 22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
* 22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
* 22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json
* 22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 21:34 samtar@deploy1002: Finished scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] (duration: 09m 27s)
* 21:26 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] synced to the testservers: mwdebug2002.cod
* 21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 21:24 samtar@deploy1002: Started scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]]
* 21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
* 20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 ejegg: updated employers.csv on paymentswiki
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
* 20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
* 19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
* 19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 07m 04s)
* 19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
* 19:01 brennen: 1.40.0-wmf.20 train ([[phab:T325583|T325583]]): no blockers, rolling to group1.
* 19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
* 18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
* 17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
* 17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service [[phab:T327405|T327405]]
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
* 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
* 16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
* 16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
* 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
* 15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:50 robh: db1139 ilom wins/netbios disabled and ilom reset [[phab:T327877|T327877]]
* 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
* 15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:43 robh: netbios wins disabled on db1140 ilom and ilom reset [[phab:T327877|T327877]]
* 15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:38 papaul: on going maintenance on fasw-c-eqiad
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
* 15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
* 15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for [[phab:T327824|T327824]] (duration: 07m 57s)
* 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for [[phab:T327824|T327824]]
* 15:04 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] (duration: 08m 43s)
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
* 15:01 urbanecm: Overrunning B&C window
* 14:57 urbanecm@deploy1002: urbanecm and migr: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
* 14:55 urbanecm@deploy1002: Started scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]]
* 14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 14:53 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] (duration: 32m 21s)
* 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
* 14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
* 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
* 14:21 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]]
* 14:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] (duration: 12m 59s)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
* 14:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]]
* 13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
* 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
* 13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
* 13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
* 13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
* 12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
* 12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 12:45 moritzm: restarting Exim on MXes to pick up new libtasn
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
* 12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
* 12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
* 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
* 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
* 12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 moritzm: installing libtasn security updates on buster
* 11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump ([[phab:T325942|T325942]])
* 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
* 11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
* 10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
* 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
* 09:30 Emperor: rolling depool & update of thanos front-ends [[phab:T327871|T327871]]
* 08:40 XioNoX: bump SGIX max prefix limit
* 08:13 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] (duration: 10m 13s)
* 08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:03 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]]
* 07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) [[phab:T327859|T327859]]
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 [[phab:T327859|T327859]]', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
* 07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
* 07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
* 07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
* 07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
* 07:08 AndyRussG: updated payments (config only) revision {{Gerrit|15395d05}}, config {{Gerrit|418160e9}}
* 04:10 eileen: config revision changed from {{Gerrit|dc0a0d3a}} to {{Gerrit|089d0acb}}
* 04:01 eileen: civicrm upgraded from {{Gerrit|9197ca29}} to {{Gerrit|3e6b21b6}}
* 03:27 eileen: civicrm upgraded from {{Gerrit|f6093fb2}} to {{Gerrit|9197ca29}}
* 03:05 eileen: config revision changed from {{Gerrit|3f641fce}} to {{Gerrit|dc0a0d3a}}
* 01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
* 00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye


== 2022-03-19 ==
== 2023-01-24 ==
* 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
* 23:10 zabe@deploy1002: Finished scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] (duration: 08m 02s)
* 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 23:04 zabe@deploy1002: zabe: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 23:02 zabe@deploy1002: Started scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]]
* 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22844 and previous config saved to /var/cache/conftool/dbconfig/20220319-171749-marostegui.json
* 22:47 TheresNoTime: closing UTC late backport window
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22843 and previous config saved to /var/cache/conftool/dbconfig/20220319-170244-marostegui.json
* 22:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] (duration: 09m 04s)
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22842 and previous config saved to /var/cache/conftool/dbconfig/20220319-164739-marostegui.json
* 22:39 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22841 and previous config saved to /var/cache/conftool/dbconfig/20220319-163234-marostegui.json
* 22:37 samtar@deploy1002: Started scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]]
* 13:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:30 samtar@deploy1002: Finished scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] (duration: 07m 59s)
* 13:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 22:22 samtar@deploy1002: Started scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]]
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 22:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] (duration: 09m 02s)
* 13:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. [[phab:T327813|T327813]]
* 13:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=piwiki --move-talk --fix # [[phab:T304201|T304201]]
* 22:13 samtar@deploy1002: samtar and stang: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:11 samtar@deploy1002: Started scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]]
* 04:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 22:08 samtar@deploy1002: Finished scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] (duration: 09m 36s)
* 04:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 22:06 TheresNoTime: extending UTC late backport window due to late start
* 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
* 03:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
* 03:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
* 03:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 22:00 samtar@deploy1002: samtar and jdrewniak: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 03:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:59 samtar@deploy1002: Started scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]]
* 03:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] (duration: 13m 31s)
* 03:28 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:45 samtar@deploy1002: nray and samtar: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 03:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
* 02:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 02:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:43 samtar@deploy1002: Started scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]]
* 02:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22839 and previous config saved to /var/cache/conftool/dbconfig/20220319-015847-marostegui.json
* 21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # [[phab:T275246|T275246]]
* 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22838 and previous config saved to /var/cache/conftool/dbconfig/20220319-015839-marostegui.json
* 21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
* 01:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 01:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22837 and previous config saved to /var/cache/conftool/dbconfig/20220319-014334-marostegui.json
* 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
* 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 01:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22836 and previous config saved to /var/cache/conftool/dbconfig/20220319-012829-marostegui.json
* 21:03 TheresNoTime: holding UTC late backport window for outage, [[phab:T327815|T327815]]
* 01:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
* 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22835 and previous config saved to /var/cache/conftool/dbconfig/20220319-011324-marostegui.json
* 20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 00:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- [[phab:T325132|T325132]]
* 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
* 20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
* 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
* 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
* 20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
* 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
* 20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
* 20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
* 20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
* 20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
* 20:16 bblack: pool cp5032
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
* 20:16 mutante: contint2001 - restarted zuul
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
* 20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
* 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
* 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
* 20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
* 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: [[phab:T326634|T326634]]
* 19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: [[phab:T326634|T326634]]
* 19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: [[phab:T326634|T326634]]
* 19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 19:39 urandom: rebooting restbase cassandra nodes, row d -- [[phab:T325132|T325132]]
* 19:33 bblack: cp5032: restart varnish-frontend
* 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
* 19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: [[phab:T326634|T326634]]
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
* 19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:55 jynus: deploy new dump grants for analytics dbs at db1108 [[phab:T327155|T327155]]
* 18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
* 18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
* 17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
* 17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
* 17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 17:19 thcipriani: restarting ci jenkins for updates
* 17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 17:04 urandom: rebooting restbase cassandra nodes, row c -- [[phab:T325132|T325132]]
* 16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
* 16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
* 15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
* 15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
* 15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
* 15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
* 14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
* 14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:25 TheresNoTime: close UTC afternoon backport window
* 14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:20 XioNoX: repool ulsfo (maintenance over)
* 14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
* 14:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] (duration: 07m 41s)
* 14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:11 samtar@deploy1002: daniel and samtar: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:09 samtar@deploy1002: Started scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]]
* 13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:44 XioNoX: reboot ulsfo switches for software upgrade
* 13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
* 13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
* 12:56 zabe@deploy1002: Finished scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] (duration: 44m 09s)
* 12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:48 XioNoX: restart ulsfo switches for network maintenance
* 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:38 zabe@deploy1002: zabe: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 12:12 zabe@deploy1002: Started scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]]
* 11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
* 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
* 11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
* 11:26 zabe@deploy1002: Finished scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] (duration: 09m 19s)
* 11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
* 11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
* 11:19 zabe@deploy1002: zabe: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:17 zabe@deploy1002: Started scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]]
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
* 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
* 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
* 10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
* 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
* 10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 10:49 XioNoX: depool ulsfo for network maintenance - [[phab:T316532|T316532]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
* 10:33 vgutierrez: repool cp4046
* 10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:31 vgutierrez: restarting varnish on cp4046
* 10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:29 vgutierrez: depool cp4046
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
* 10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
* 10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
* 10:17 effie: depooling maps from equad && pooling maps on codfw
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
* 10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - [[phab:T327754|T327754]]
* 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
* 09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
* 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
* 09:41 moritzm: installing libtasn1-6 security updates on buster
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
* 09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
* 09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
* 09:14 kart_: Done: UTC morning backport window
* 09:13 kartik@deploy1002: Finished scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] (duration: 09m 44s)
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
* 09:05 kartik@deploy1002: awight and kartik: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:03 kartik@deploy1002: Started scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
* 09:01 kartik@deploy1002: Finished scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] (duration: 10m 42s)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
* 08:52 kartik@deploy1002: awight and kartik: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:50 kartik@deploy1002: Started scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]]
* 08:48 kartik@deploy1002: Finished scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] (duration: 15m 20s)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
* 08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - [[phab:T327745|T327745]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
* 08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
* 08:33 kartik@deploy1002: Started scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
* 08:28 kartik@deploy1002: Finished scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] (duration: 09m 09s)
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
* 08:21 kartik@deploy1002: kartik and matmarex: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
* 08:19 kartik@deploy1002: Started scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]]
* 08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - [[phab:T327739|T327739]]
* 08:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] (duration: 10m 25s)
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]]
* 07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
* 07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T327616|T327616]]', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
* 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
* 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
* 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
* 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
* 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
* 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
* 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 53m 01s)
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 03:30 AndyRussG: payments-wiki upgraded from {{Gerrit|3d882ac7}} to {{Gerrit|15395d05}}
* 02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
* 02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
* 01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
* 01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
* 01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
* 01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
* 00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
* 00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
* 00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
* 00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
* 00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
* 00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
* 00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
* 00:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] (duration: 12m 47s)
* 00:03 zabe@deploy1002: zabe: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:01 zabe@deploy1002: Started scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]]


== 2022-03-18 ==
== 2023-01-23 ==
* 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
* 21:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
* 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
* 15:38 jayme: powercycle kubernetes1002
* 23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
* 14:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: [[gerrit:771907{{!}}Don't pass the revision to PO access service (T304127)]] (duration: 00m 49s)
* 22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
* 14:12 XioNoX: configure NAT for civi1002 - [[phab:T304098|T304098]]
* 22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 14:02 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 14:02 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 14:01 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
* 14:01 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
* 13:59 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
* 13:59 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
* 13:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
* 22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
* 13:07 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
* 22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
* 13:02 moritzm: imported python3.5  3.5.3-1+deb9u5+wmf1 to component/python35 [[phab:T303801|T303801]]
* 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
* 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 22:31 maryum: Deployed patch for [[phab:T285159|T285159]]
* 11:35 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 21:40 zabe@deploy1002: Finished scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] (duration: 07m 45s)
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 21:34 zabe@deploy1002: zabe: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:29 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 21:32 zabe@deploy1002: Started scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]]
* 11:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
* 11:09 vgutierrez: rolling restart of nginx on ncredir instances to catch up on OpenSSL updates
* 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
* 11:05 vgutierrez: restarting acme-chief and acme-chief API services to catch up on OpenSSL updates
* 21:12 kindrobot: close UTC late backport window
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 21:12 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] (duration: 09m 00s)
* 10:54 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 21:03 kindrobot@deploy1002: Started scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]]
* 10:52 akosiaris: drain kubernetes200[1-4] [[phab:T303045|T303045]]
* 21:01 kindrobot: start UTC late backport window
* 10:51 akosiaris: depool kubernetes200[1-4] [[phab:T303045|T303045]]
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2004.codfw.wmnet
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2003.codfw.wmnet
* 20:45 taavi: restart [[phab:T315510|T315510]] on group1 after mwmaint restart, currently running on wikidatawiki
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2002.codfw.wmnet
* 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2001.codfw.wmnet
* 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
* 10:01 akosiaris: drain kubernetes100[1-4] [[phab:T303044|T303044]]
* 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
* 09:54 akosiaris: depool kubernetes100[1-4] from pybal [[phab:T303044|T303044]]
* 19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1004.eqiad.wmnet
* 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1003.eqiad.wmnet
* 19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1002.eqiad.wmnet
* 19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
* 19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 09:42 akosiaris: uncordon kubernetes1018-1022. [[phab:T293728|T293728]]. Nodes are live, ready to receive workloads and traffic.
* 19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal. [[phab:T293728|T293728]]
* 19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal.
* 18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1022.eqiad.wmnet
* 18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - [[phab:T327405|T327405]]
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1021.eqiad.wmnet
* 18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1020.eqiad.wmnet
* 18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1019.eqiad.wmnet
* 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1018.eqiad.wmnet
* 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22827 and previous config saved to /var/cache/conftool/dbconfig/20220318-093543-marostegui.json
* 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
* 09:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
* 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1022.eqiad.wmnet
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1021.eqiad.wmnet
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1020.eqiad.wmnet
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1019.eqiad.wmnet
* 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1018.eqiad.wmnet
* 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:08 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22826 and previous config saved to /var/cache/conftool/dbconfig/20220318-085517-marostegui.json
* 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22825 and previous config saved to /var/cache/conftool/dbconfig/20220318-084012-marostegui.json
* 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22824 and previous config saved to /var/cache/conftool/dbconfig/20220318-082507-marostegui.json
* 16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22823 and previous config saved to /var/cache/conftool/dbconfig/20220318-081002-marostegui.json
* 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22822 and previous config saved to /var/cache/conftool/dbconfig/20220318-072852-root.json
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22821 and previous config saved to /var/cache/conftool/dbconfig/20220318-071758-marostegui.json
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22820 and previous config saved to /var/cache/conftool/dbconfig/20220318-071750-marostegui.json
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22819 and previous config saved to /var/cache/conftool/dbconfig/20220318-071348-root.json
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22818 and previous config saved to /var/cache/conftool/dbconfig/20220318-070245-marostegui.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22817 and previous config saved to /var/cache/conftool/dbconfig/20220318-065844-root.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22816 and previous config saved to /var/cache/conftool/dbconfig/20220318-064740-marostegui.json
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22815 and previous config saved to /var/cache/conftool/dbconfig/20220318-064340-root.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22814 and previous config saved to /var/cache/conftool/dbconfig/20220318-063631-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22813 and previous config saved to /var/cache/conftool/dbconfig/20220318-063524-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22812 and previous config saved to /var/cache/conftool/dbconfig/20220318-063235-marostegui.json
* 15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: [[phab:T326634|T326634]]
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22811 and previous config saved to /var/cache/conftool/dbconfig/20220318-062836-root.json
* 15:50 urbanecm: Deploy security patch for [[phab:T327613|T327613]]
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22810 and previous config saved to /var/cache/conftool/dbconfig/20220318-062127-root.json
* 15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22809 and previous config saved to /var/cache/conftool/dbconfig/20220318-062020-root.json
* 15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22808 and previous config saved to /var/cache/conftool/dbconfig/20220318-061332-root.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bullseye
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22807 and previous config saved to /var/cache/conftool/dbconfig/20220318-060623-root.json
* 15:44 papaul: on going maintenance on fasw-codfw
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22806 and previous config saved to /var/cache/conftool/dbconfig/20220318-060516-root.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
* 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: [[phab:T325563|T325563]]
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22805 and previous config saved to /var/cache/conftool/dbconfig/20220318-055119-root.json
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22804 and previous config saved to /var/cache/conftool/dbconfig/20220318-055012-root.json
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bullseye
* 15:09 taavi@deploy1002: Finished scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] (duration: 07m 28s)
* 05:39 marostegui: dbmaint on s3@eqiad [[phab:T300600|T300600]]
* 15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 reimage [[phab:T300600|T300600]]', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
* 15:02 taavi@deploy1002: Started scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]]
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22802 and previous config saved to /var/cache/conftool/dbconfig/20220318-053615-root.json
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22801 and previous config saved to /var/cache/conftool/dbconfig/20220318-053508-root.json
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22800 and previous config saved to /var/cache/conftool/dbconfig/20220318-053443-marostegui.json
* 15:00 taavi@deploy1002: Finished scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] (duration: 07m 56s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 01:23 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 14:53 taavi@deploy1002: taavi and sbailey: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 01:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 14:52 taavi@deploy1002: Started scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]]
* 01:14 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 14:46 taavi@deploy1002: Finished scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] (duration: 08m 48s)
* 14:42 sukhe: rolling out pybal 1.15.10: [[phab:T321191|T321191]]
* 14:39 taavi@deploy1002: taavi and func: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:37 taavi@deploy1002: Started scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]]
* 14:37 taavi@deploy1002: Finished scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] (duration: 11m 24s)
* 14:27 taavi@deploy1002: stang and taavi: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 14:25 taavi@deploy1002: Started scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]]
* 14:25 taavi@deploy1002: Finished scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] (duration: 09m 22s)
* 14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # [[phab:T326387|T326387]]
* 14:17 taavi@deploy1002: taavi and stang: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:16 taavi@deploy1002: Started scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]]
* 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
* 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
* 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
* 12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
* 12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
* 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
* 11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
* 11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
* 11:57 marostegui: Reboot db2132 (m1 codfw master)
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
* 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
* 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
* 11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - [[phab:T327644|T327644]]
* 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
* 10:55 XioNoX: update management routers ACLs to add new bast hosts
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
* 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
* 10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:07 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] (duration: 07m 51s)
* 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:01 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:59 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]]
* 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 zabe@deploy1002: Finished scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] (duration: 07m 48s)
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
* 08:39 zabe@deploy1002: zabe: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:37 zabe@deploy1002: Started scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]]
* 08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
* 08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 08:30 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] (duration: 17m 12s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
* 08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:12 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
* 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
* 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
* 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
* 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:23 kart_: Updated cxserver to 2023-01-20-051603-production ([[phab:T323840|T323840]], [[phab:T326236|T326236]])
* 06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
* 04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - [[phab:T327611|T327611]]
* 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 [[phab:T327609|T327609]]', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
* 03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - [[phab:T327609|T327609]]


== 2022-03-17 ==
== 2023-01-20 ==
* 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:22 jynus: deploying new grants for backups on m1 [[phab:T327155|T327155]]
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 22:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 22:28 derick@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: [[gerrit:771606{{!}}Add new field to capture application URL link on Meta]] (duration: 00m 50s)
* 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 22:17 derick@deploy1002: Finished scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]] (duration: 11m 37s)
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:05 derick@deploy1002: Started scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]]
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:08 moritzm: installing node-minimatch security updates
* 22:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:01 moritzm: installing libxstream-java security updates
* 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: [[phab:T325557|T325557]]
* 22:00 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771712{{!}}Revert "Revert "Revert "Enable Parsoid API everywhere"""]] (duration: 00m 51s)
* 12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
* 21:48 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771707{{!}}Revert "Revert "Enable Parsoid API everywhere""]] (duration: 00m 51s)
* 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:17 moritzm: installing ping1003 [[phab:T273509|T273509]]
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
* 21:45 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:13 moritzm: installing emacs security updates on bullseye
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci [[phab:T326531|T326531]]
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 08:59 moritzm: installing ping2003 [[phab:T273509|T273509]]
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 01:55 ejegg: payments-wiki upgraded from {{Gerrit|3cf03933}} to {{Gerrit|3d882ac7}}
* 21:40 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 01:12 ejegg: payments-wiki upgraded from {{Gerrit|fcb9ab60}} to {{Gerrit|3cf03933}}
* 21:35 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:23 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
* 21:21 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: [[gerrit:771394{{!}}Update invalid skin preference update script (T299104)]] (duration: 00m 51s)
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]] (duration: 00m 50s)
* 21:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 20:57 ladsgroup@deploy1002: Finished scap: Revert "rdbms: Followups to automatic connection recovery patch" (duration: 11m 50s)
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 ladsgroup@deploy1002: Started scap: Revert "rdbms: Followups to automatic connection recovery patch"
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22798 and previous config saved to /var/cache/conftool/dbconfig/20220317-204128-marostegui.json
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:29 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:763779{{!}}Revert "Enable Parsoid API everywhere" (T302081)]] (duration: 00m 50s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22797 and previous config saved to /var/cache/conftool/dbconfig/20220317-202623-marostegui.json
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22796 and previous config saved to /var/cache/conftool/dbconfig/20220317-201118-marostegui.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22795 and previous config saved to /var/cache/conftool/dbconfig/20220317-195613-marostegui.json
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:53 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.26/skins/Vector/includes/Hooks.php: Backport: [[gerrit:771395{{!}}Fix updateUserLinksDropdownItems not being called (T304002)]] (duration: 00m 50s)
* 18:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:27 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:18 akosiaris: cordon kubernetes10<nowiki>{</nowiki>18..22<nowiki>}</nowiki> [[phab:T293728|T293728]]
* 18:12 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:41 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:41 arturo: uploaded prometheus-openstack-exporter 0.0.8-4~wmf1 to bullseye-wikimedia ([[phab:T302178|T302178]])
* 17:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:34 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 17:34 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 17:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:28 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 17:28 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 17:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
* 17:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
* 17:25 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
* 17:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
* 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
* 17:21 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:21 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:20 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
* 17:20 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
* 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
* 17:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:15 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
* 17:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:10 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:07 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:06 bblack: geodns - Cyprus routed to new drmrs edge DC (first live users!) - will phase in over the standard 10 minute DNS TTL
* 17:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:03 volans: restart atftp on install1003
* 17:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:50 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:48 XioNoX: disable BGP to Lumen in codfw for fiber move
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22794 and previous config saved to /var/cache/conftool/dbconfig/20220317-164228-marostegui.json
* 16:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:36 moritzm: restarting LDAP replicas for openssl update
* 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 16:34 ryankemper: [WDQS] Pooled `wdqs2001` (caught up on lag)
* 16:31 andrewbogott: sudo service networking restart on puppetmaster1003
* 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22793 and previous config saved to /var/cache/conftool/dbconfig/20220317-162723-marostegui.json
* 16:15 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22792 and previous config saved to /var/cache/conftool/dbconfig/20220317-161218-marostegui.json
* 16:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:10 XioNoX: pfw3-codfw move traffic to cr2 uplink
* 16:05 oblivian@puppetmaster1001: conftool action : edit; selector: name=random_q
* 16:04 ryankemper: [WDQS] Depooled `wdqs2001` (~4.85 hours of lag to catch up)
* 16:03 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
* 16:03 ryankemper: [WDQS] Pooled `wdqs2003` (caught up on lag)
* 16:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:00 moritzm: restarting apache on logstash*
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60980ce85c080fadaf0b2cb561be53f861ca94e0}}: ptwiki: Disable Growth image recommendation ([[phab:T302828|T302828]]) (duration: 00m 53s)
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22790 and previous config saved to /var/cache/conftool/dbconfig/20220317-155713-marostegui.json
* 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:46 XioNoX: cr1-codfw move xe-5/2/0 to xe-1/0/1:1 - [[phab:T289241|T289241]]
* 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:34 moritzm: restarting FPM on mw canaries
* 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
* 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
* 15:30 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
* 15:07 XioNoX: disable BGP to Telia in codfw for fiber move - [[phab:T289241|T289241]]
* 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
* 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22789 and previous config saved to /var/cache/conftool/dbconfig/20220317-145716-marostegui.json
* 14:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22788 and previous config saved to /var/cache/conftool/dbconfig/20220317-145708-marostegui.json
* 14:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22785 and previous config saved to /var/cache/conftool/dbconfig/20220317-144203-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22784 and previous config saved to /var/cache/conftool/dbconfig/20220317-142658-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22783 and previous config saved to /var/cache/conftool/dbconfig/20220317-141152-marostegui.json
* 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 13:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:17 Lucas_WMDE: UTC afternoon backport window done
* 13:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771595{{!}}commonswiki: Add pictures.snsb.info to wgCopyUploadsDomains allowlist (T303929)]] (duration: 00m 50s)
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22782 and previous config saved to /var/cache/conftool/dbconfig/20220317-131227-marostegui.json
* 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22781 and previous config saved to /var/cache/conftool/dbconfig/20220317-131220-marostegui.json
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (3/3) (duration: 00m 49s)
* 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (2/3) (duration: 00m 49s)
* 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (1/3) (duration: 00m 53s)
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22780 and previous config saved to /var/cache/conftool/dbconfig/20220317-125715-marostegui.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22779 and previous config saved to /var/cache/conftool/dbconfig/20220317-124209-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22778 and previous config saved to /var/cache/conftool/dbconfig/20220317-122704-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22777 and previous config saved to /var/cache/conftool/dbconfig/20220317-120700-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22776 and previous config saved to /var/cache/conftool/dbconfig/20220317-115156-root.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22775 and previous config saved to /var/cache/conftool/dbconfig/20220317-115012-root.json
* 11:42 volans: upgrades spicerack on cumin hosts to v2.3.3
* 11:41 volans: uploaded spicerack_2.3.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22774 and previous config saved to /var/cache/conftool/dbconfig/20220317-113652-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22773 and previous config saved to /var/cache/conftool/dbconfig/20220317-113508-root.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22772 and previous config saved to /var/cache/conftool/dbconfig/20220317-112921-marostegui.json
* 11:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 11:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22771 and previous config saved to /var/cache/conftool/dbconfig/20220317-112913-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22770 and previous config saved to /var/cache/conftool/dbconfig/20220317-112148-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22769 and previous config saved to /var/cache/conftool/dbconfig/20220317-112004-root.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22768 and previous config saved to /var/cache/conftool/dbconfig/20220317-111408-marostegui.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22767 and previous config saved to /var/cache/conftool/dbconfig/20220317-110645-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22766 and previous config saved to /var/cache/conftool/dbconfig/20220317-110536-marostegui.json
* 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22765 and previous config saved to /var/cache/conftool/dbconfig/20220317-105903-marostegui.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22764 and previous config saved to /var/cache/conftool/dbconfig/20220317-105349-marostegui.json
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-fe[1005-1008].eqiad.wmnet
* 10:47 marostegui: dbmaint on s3@eqiad [[phab:T298556|T298556]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22763 and previous config saved to /var/cache/conftool/dbconfig/20220317-104358-marostegui.json
* 10:40 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22762 and previous config saved to /var/cache/conftool/dbconfig/20220317-103844-marostegui.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22761 and previous config saved to /var/cache/conftool/dbconfig/20220317-103726-marostegui.json
* 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22760 and previous config saved to /var/cache/conftool/dbconfig/20220317-103719-marostegui.json
* 10:31 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:26 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-fe[1005-1008].eqiad.wmnet
* 10:24 marostegui: dbmaint on s3@codfw [[phab:T298556|T298556]]
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22759 and previous config saved to /var/cache/conftool/dbconfig/20220317-102214-marostegui.json
* 10:10 marostegui: dbmaint on s7@eqiad [[phab:T298556|T298556]]
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22758 and previous config saved to /var/cache/conftool/dbconfig/20220317-100709-marostegui.json
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22757 and previous config saved to /var/cache/conftool/dbconfig/20220317-095204-marostegui.json
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22756 and previous config saved to /var/cache/conftool/dbconfig/20220317-095044-marostegui.json
* 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22755 and previous config saved to /var/cache/conftool/dbconfig/20220317-094025-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22754 and previous config saved to /var/cache/conftool/dbconfig/20220317-094017-marostegui.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22752 and previous config saved to /var/cache/conftool/dbconfig/20220317-092512-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22751 and previous config saved to /var/cache/conftool/dbconfig/20220317-091911-marostegui.json
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22750 and previous config saved to /var/cache/conftool/dbconfig/20220317-091007-marostegui.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22749 and previous config saved to /var/cache/conftool/dbconfig/20220317-085502-marostegui.json
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clarakosi out of all services on: 1881 hosts
* 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Clarakosi out of all services on: 1881 hosts
* 08:24 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|0da40c22844746120de9b33e772598d38aa74326}}: throttle: Remove expired rules (duration: 00m 50s)
* 08:23 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|980ea35d454563e538d08b9d6462064455b4d28c}}: Throttle: Increase limit for English Wikipedia ([[phab:T304016|T304016]]) (duration: 00m 51s)
* 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ppchelko out of all services on: 1881 hosts
* 08:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ppchelko out of all services on: 1881 hosts
* 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Accraze out of all services on: 1881 hosts
* 08:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Accraze out of all services on: 1881 hosts
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22748 and previous config saved to /var/cache/conftool/dbconfig/20220317-080705-root.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22747 and previous config saved to /var/cache/conftool/dbconfig/20220317-075350-marostegui.json
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22746 and previous config saved to /var/cache/conftool/dbconfig/20220317-075201-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22745 and previous config saved to /var/cache/conftool/dbconfig/20220317-073658-root.json
* 07:31 marostegui: dbmaint on s5@eqiad [[phab:T297189|T297189]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22744 and previous config saved to /var/cache/conftool/dbconfig/20220317-072154-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json
* 07:11 ryankemper: [WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on)
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 06:57 ryankemper: [WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22
* 06:57 ryankemper: [WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently.
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json
* 06:54 ryankemper: [WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service`
* 06:53 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
* 06:50 elukey: restart blazegraph on wdqs2004
* 06:46 elukey: kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken)
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22739 and previous config saved to /var/cache/conftool/dbconfig/20220317-062648-root.json
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22738 and previous config saved to /var/cache/conftool/dbconfig/20220317-061144-root.json
* 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22737 and previous config saved to /var/cache/conftool/dbconfig/20220317-040634-marostegui.json
* 04:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye


== 2022-03-16 ==
== 2023-01-19 ==
* 23:52 tzatziki: Removing  two files for legal compliance
* 21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
* 21:17 cjming: end running skin update preference maintenance script
* 21:42 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] (duration: 10m 38s)
* 20:52 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [no-op] {{Gerrit|8efa537}}: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion ([[phab:T303240|T303240]]) (duration: 00m 53s)
* 20:38 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: {{Gerrit|9ba157b}}: Add insert option for update skin preferences script ([[phab:T299104|T299104]]) (duration: 00m 50s)
* 20:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: {{Gerrit|ebfc516}}: Add script to update vector skin preferences ([[phab:T299104|T299104]]) (duration: 00m 51s)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025


== 2022-03-15 ==
== 2023-01-18 ==
* 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # [[phab:T327290|T327290]]
* 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 23:42 cstone: civicrm upgraded from {{Gerrit|164270b0}} to {{Gerrit|f6093fb2}}
* 22:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G  - bking@cumin1001 - [[phab:T323646|T323646]]
* 22:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G 
* 22:05 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 22:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 22:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 22:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 22:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 21:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 21:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge


== 2022-03-14 ==
== 2023-01-17 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22460 and previous config saved to /var/cache/conftool/dbconfig/20220314-234430-marostegui.json
* 23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request [[:phab:T327149{{!}}T327149]]" # [[phab:T327149|T327149]]
* 23:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:33 zabe@deploy1002: Finished scap: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]] (duration: 09m 58s)
* 23:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:25 zabe@deploy1002: zabe and zabe: Backport
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:28


== 2022-03-11 ==
== 2023-01-16 ==
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
* 17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
* 17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 15:42 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
* 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:38 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:36 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
* 15:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 15:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 15:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
* 15:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
* 15:27 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
* 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 15:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host kubernetes2013.codfw.wmnet with OS bullseye
* 15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22374 and previous config saved to /var/cache/conftool/dbconfig/20220311-150702-root.json
* 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
* 15:02 XioNoX: cr1/2-eqiad AVOID-PATHS as-path TI "6762 .*"
* 13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - [[phab:T304712|T304712]]
* 15:02 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*" <- rolled back
* 13:34 XioNoX: repool eqiad-eqord link - [[phab:T304712|T304712]]
* 14:57 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*"
* 12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
* 12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22373 and previous config saved to /var/cache/conftool/dbconfig/20220311-145159-root.json
* 12:50 XioNoX: drain eqiad-eqord link - [[phab:T304712|T304712]]
* 14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
* 12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22372 and previous config saved to /var/cache/conftool/dbconfig/20220311-143652-root.json
* 12:43 Amir1: power cycled db1198
* 14:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
* 12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22371 and previous config saved to /var/cache/conftool/dbconfig/20220311-142147-root.json
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22370 and previous config saved to /var/cache/conftool/dbconfig/20220311-140641-root.json
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P22369 and previous config saved to /var/cache/conftool/dbconfig/20220311-140549-marostegui.json
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22368 and previous config saved to /var/cache/conftool/dbconfig/20220311-135137-root.json
* 12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 13:49 marostegui: dbmaint on s8@eqiad [[phab:T300775|T300775]]
* 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 13:49 marostegui: dbmaint on s1@eqiad [[phab:T298294|T298294]]
* 11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:43 jelto: update pcc facts
* 11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22367 and previous config saved to /var/cache/conftool/dbconfig/20220311-133633-root.json
* 11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P22366 and previous config saved to /var/cache/conftool/dbconfig/20220311-133407-marostegui.json
* 11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cumin2001.codfw.wmnet
* 11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cumin2001.codfw.wmnet
* 10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 11:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
* 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
* 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
* 10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:59 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2110.codfw.wmnet with OS bullseye
* 10:48 moritzm: installing libtasn1-6 security updates on Bullseye
* 10:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
* 08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 10:39 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
* 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 08:14 oblivian@deploy1002: Synchronized README: test null deployment for [[phab:T327041|T327041]] (duration: 07m 12s)
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
* 08:09 Emperor: stopped swift_rclone_sync on ms-be1069
* 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
* 07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]{{!}}10).codfw.wmnet
* 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
* 07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]{{!}}3[0-4]).codfw.wmnet
* 10:25 vgutierrez: disable certspotter - [[phab:T303593|T303593]]
* 07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59{{!}}6[0-9]{{!}}70).codfw.wmnet
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
* 07:26 _joe_: restarting pybal on lvs2009
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*{{!}}appservers{{!}}api)-ro,name=codfw
* 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 07:10 _joe_: depooling mediawiki in codfw
* 10:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2110.codfw.wmnet with OS bullseye
* 06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
* 10:16 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
* 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 01:35 Amir1: rolling restart of php-fpm across the fleet
* 10:03 dcausse: manually installed jvmquake to wdqs1010 (test machine) from https://people.wikimedia.org/~jmm/jvmquake/
* 01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
* 09:54 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 01:29 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] (duration: 24m 47s)
* 09:47 vgutierrez: stopping certspotter on alert1001
* 01:15 thcipriani@deploy1002: thcipriani and func: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 01:05 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]]
* 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 09:00 jayme: kubernetes2011:~# systemctl restart rsyslog.service - [[phab:T289766|T289766]]
* 08:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 08:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
* 08:43 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 5hours)
* 08:42 jynus: upgrade and restart db2139
* 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
* 08:30 jynus: upgrade and restart db1145
* 08:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
* 08:21 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
* 08:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22364 and previous config saved to /var/cache/conftool/dbconfig/20220311-063921-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22363 and previous config saved to /var/cache/conftool/dbconfig/20220311-062417-root.json
* 06:13 marostegui: Reboot dbproxy1014 [[phab:T303174|T303174]]
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22362 and previous config saved to /var/cache/conftool/dbconfig/20220311-060913-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22361 and previous config saved to /var/cache/conftool/dbconfig/20220311-055409-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P22360 and previous config saved to /var/cache/conftool/dbconfig/20220311-054514-marostegui.json
* 02:54 eileen: revision changed from {{Gerrit|9fb68b24}} to {{Gerrit|252269c8}}
* 01:56 eileen: civicrm revision changed from {{Gerrit|8501c38c}} to {{Gerrit|9fb68b24}}
* 01:31 eileen: civicrm changed from {{Gerrit|4cb2bdbc}} to {{Gerrit|8501c38c}}
* 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
* 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply


== 2022-03-10 ==
== 2023-01-14 ==
* 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 09:46 godog: issue 'request system reboot member 2' - [[phab:T327001|T327001]]
* 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
* 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:19 Emperor: depool thanos-fe2002 [[phab:T327001|T327001]]
* 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
* 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
* 09:19 Emperor: depool ms-fe2010 [[phab:T327001|T327001]]
* 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
* 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:41 rzl: UTC late B&C training window done
* 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:769779{{!}}CommonSettings: Update comment about Image Suggestions API (T294362)]] (duration: 00m 48s)
* 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: [[gerrit:769559{{!}}Fix highlighting of comments when reloading (T303261)]] (duration: 00m 47s)
* 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: [[gerrit:769558{{!}}Preserve classes on media wrapper links (T292657 T303469)]] (duration: 00m 49s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 cstone: update Donation Interface revision changed from {{Gerrit|ca37a93e}} to {{Gerrit|5db12b21}}
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:766307{{!}}Remove centralauth-oversight from the config (T302675)]] (duration: 00m 49s)
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
* 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
* 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:40 moritzm: restarting thumbor to pick up tiff security updates
* 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:36 moritzm: installing tiff security updates
* 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
* 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
* 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
* 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for [[phab:T300295|T300295]]
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:11 moritzm: installing cyrus-sasl2 security updates
* 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 17:51 herron: repool thanos-fe1001
* 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:43 herron: depooling thanos-fe1001 for envoy upgrade
* 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:761965{{!}}wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956)]] (duration: 00m 50s)
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
* 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
* 16:57 damilare: civicrm change revision from {{Gerrit|9b5aafbc}} to {{Gerrit|4cb2bdbc}}
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
* 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
* 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D<nowiki>{</nowiki>cumin1001.mgmt<nowiki>}</nowiki> with reason: Testing alertmanager downtime
* 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
* 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
* 16:30 sukhe: depool doh1002 for testing eBPF
* 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
* 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
* 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - [[phab:T204993|T204993]]
* 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
* 15:21 moritzm: installing expat security updates on stretch
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
* 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
* 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
* 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:08 akosiaris: repool ores in eqiad in discovery records
* 14:06 urbanecm: UTC afternoon B&C done
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
* 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
* 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 13:51 akosiaris: repool ores in codfw in discovery records
* 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
* 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
* 13:43 akosiaris: reboot rdb2007 for upgrades
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
* 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
* 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
* 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
* 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
* 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:58 marostegui: Failover m1 master
* 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
* 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
* 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
* 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
* 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
* 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
* 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
* 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
* 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
* 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
* 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
* 10:48 jbond: re-enable puppet fleet wide
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
* 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 10:44 akosiaris: reboot rdb2009 for upgrades
* 10:44 jbond: disable puppet fleet wide
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
* 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
* 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
* 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
* 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
* 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
* 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
* 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
* 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
* 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
* 08:43 apergos: UTC morning backport and config window completed
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
* 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: [[gerrit:769656{{!}}SectionTranslation: Also add languages to target (T298237)]] (duration: 00m 49s)
* 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
* 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: [[gerrit:769386{{!}}Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237)]] (duration: 00m 50s)
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
* 08:03 marostegui: Reboot dbproxy1017 1016 [[phab:T303174|T303174]]
* 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 [[phab:T303174|T303174]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
* 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 [[phab:T303174|T303174]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
* 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
* 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
* 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
* 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
* 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
* 06:07 marostegui: dbmaint on s3@eqiad [[phab:T272512|T272512]]
* 06:05 marostegui: dbmaint on s7@eqiad [[phab:T272512|T272512]]
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui: dbmaint on s5@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on s4@eqiad [[phab:T272512|T272512]]
* 05:46 marostegui: dbmaint on pc3@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc2@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on pc1@eqiad [[phab:T272512|T272512]]
* 05:45 marostegui: dbmaint on s2@eqiad [[phab:T272512|T272512]]
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
* 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
* 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2022-03-09 ==
== 2023-01-13 ==
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
* 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
* 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]] (duration: 00m 49s)
* 20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
* 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
* 23:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 23:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1047.eqiad.wmnet
* 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
* 20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 22:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1047.eqiad.wmnet
* 20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
* 22:54 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
* 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
* 22:35 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 22:35 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
* 22:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22229 and previous config saved to /var/cache/conftool/dbconfig/20220309-223130-marostegui.json
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 22:15 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22228 and previous config saved to /var/cache/conftool/dbconfig/20220309-221555-marostegui.json
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 22:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22226 and previous config saved to /var/cache/conftool/dbconfig/20220309-220020-marostegui.json
* 20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
* 21:57 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/Gadgets: [[phab:T303455|T303455]] (duration: 00m 50s)
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 21:54 volans: uploaded python3-wmflib_1.1.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 21:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
* 21:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
* 21:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22225 and previous config saved to /var/cache/conftool/dbconfig/20220309-214445-marostegui.json
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 21:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
* 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 21:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 20:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 20:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 20:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - [[phab:T301955|T301955]]
* 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 20:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 20:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # [[phab:T298707|T298707]]
* 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 17:34 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] (duration: 13m 25s)
* 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 17:22 thcipriani@deploy1002: thcipriani and abi: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:20 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]]
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
* 19:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24  refs [[phab:T300201|T300201]] (duration: 00m 50s)
* 15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 19:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24  refs [[phab:T300201|T300201]]
* 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
* 19:07 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]] (duration: 00m 49s)
* 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
* 19:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]]
* 12:48 moritzm: installing bast6002 [[phab:T324974|T324974]]
* 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22222 and previous config saved to /var/cache/conftool/dbconfig/20220309-182355-marostegui.json
* 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 18:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 18:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
* 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22221 and previous config saved to /var/cache/conftool/dbconfig/20220309-182316-marostegui.json
* 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
* 18:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22220 and previous config saved to /var/cache/conftool/dbconfig/20220309-180741-marostegui.json
* 10:53 moritzm: installing bast5003 [[phab:T324974|T324974]]
* 17:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22219 and previous config saved to /var/cache/conftool/dbconfig/20220309-175205-marostegui.json
* 10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:41 moritzm: installing bast4004 [[phab:T324974|T324974]]
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:06 moritzm: installing bast3006 [[phab:T324974|T324974]]
* 17:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22217 and previous config saved to /var/cache/conftool/dbconfig/20220309-173630-marostegui.json
* 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:29 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WebAuthn/: [[phab:T303404|T303404]] (duration: 00m 53s)
* 01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
* 17:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
* 17:28 reedy@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/WebAuthn/: [[phab:T303404|T303404]] (duration: 00m 51s)
* 01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
* 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye
* 01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
* 17:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
* 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
* 17:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2: