You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Nova Resource:Admin/SAL: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (andrewbogott: repool labdb1011 (T237509)) |
imported>Stashbot (andrewbogott: resetting password for the 'troveguest' rabbitmq user. I think I may have broken this during a recent rebuild of the rabbitmq cluster) |
||
(361 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=== 2022-05-19 === | |||
* 15:21 andrewbogott: resetting password for the 'troveguest' rabbitmq user. I think I may have broken this during a recent rebuild of the rabbitmq cluster | |||
=== 2022-05-18 === | |||
* 15:42 andrewbogott: updated the 'debian-11.0-bullseye' glance image with a fresh build | |||
=== 2022-05-14 === | |||
* 11:33 taavi: deleted projects 'ores' and 'ores-staging' [[phab:T308102|T308102]] | |||
=== 2022-05-13 === | |||
* 06:20 wm-bot2: Safe reboot of 'cloudvirt1045.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 06:20 wm-bot2: Unset cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 06:16 wm-bot2: Drained 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:16 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 06:15 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:15 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:11 wm-bot2: Safe reboot of 'cloudvirt1044.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 06:11 wm-bot2: Unset cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 06:10 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 06:10 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:09 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:07 wm-bot2: Drained 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:06 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 06:06 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 06:05 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:51 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:50 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:50 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:49 wm-bot2: Set cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:49 wm-bot2: Safe reboot of 'cloudvirt1043.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 05:49 wm-bot2: Unset cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:49 wm-bot2: Draining 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:49 wm-bot2: Safe rebooting 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:47 wm-bot2: Safe reboot of 'cloudvirt1042.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 05:47 wm-bot2: Unset cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:45 wm-bot2: Drained 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:45 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:45 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:44 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:44 wm-bot2: Drained 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:42 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:42 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:42 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:41 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:40 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:40 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:38 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:37 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:37 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:30 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:29 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:29 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:19 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:18 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:18 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:18 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:18 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:18 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:12 wm-bot2: Safe reboot of 'cloudvirt1040.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 05:12 wm-bot2: Unset cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:08 wm-bot2: Drained 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:02 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:02 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 05:02 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:02 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:02 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 05:01 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:52 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:51 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:51 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:48 wm-bot2: Safe reboot of 'cloudvirt1041.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 04:48 wm-bot2: Unset cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:44 wm-bot2: Drained 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:31 wm-bot2: Set cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:30 wm-bot2: Draining 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:30 wm-bot2: Safe rebooting 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:30 wm-bot2: Safe reboot of 'cloudvirt1039.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 04:30 wm-bot2: Unset cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:27 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:26 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:26 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:26 wm-bot2: Drained 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:26 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:25 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:25 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:24 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:23 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:23 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:23 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:22 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:22 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:21 wm-bot2: Safe reboot of 'cloudvirt1038.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 04:21 wm-bot2: Unset cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:18 wm-bot2: Drained 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:16 wm-bot2: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:16 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:16 wm-bot2: Draining 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:16 wm-bot2: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:15 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:15 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:37 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:36 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:36 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:34 wm-bot2: Safe reboot of 'cloudvirt1037.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 03:34 wm-bot2: Unset cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:27 wm-bot2: Drained 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:27 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:26 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:26 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:26 wm-bot2: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:25 wm-bot2: Draining 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:25 wm-bot2: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:22 wm-bot2: Unset cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:55 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:55 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:54 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:54 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:54 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:54 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:05 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:05 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:05 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:04 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:04 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:03 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:23 wm-bot2: Safe reboot of 'cloudvirt1035.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 01:23 wm-bot2: Unset cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 01:19 wm-bot2: Drained 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:01 wm-bot2: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 01:01 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 01:00 wm-bot2: Draining 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:00 wm-bot2: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:00 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:00 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 00:25 wm-bot2: Safe reboot of 'cloudvirt1033.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 00:25 wm-bot2: Unset cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:21 wm-bot2: Drained 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 00:20 wm-bot2: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:19 wm-bot2: Draining 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 00:19 wm-bot2: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 00:11 wm-bot2: Safe reboot of 'cloudvirt1034.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 00:11 wm-bot2: Unset cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:07 wm-bot2: Drained 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
=== 2022-05-12 === | |||
* 23:55 wm-bot2: Set cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 23:55 wm-bot2: Set cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 23:54 wm-bot2: Draining 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 23:54 wm-bot2: Safe rebooting 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 23:54 wm-bot2: Draining 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 23:54 wm-bot2: Safe rebooting 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 22:23 wm-bot2: Safe reboot of 'cloudvirt1031.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 22:23 wm-bot2: Unset cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 22:20 wm-bot2: Drained 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 22:17 wm-bot2: Safe reboot of 'cloudvirt1032.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 22:17 wm-bot2: Unset cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 22:13 wm-bot2: Drained 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:57 wm-bot2: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:56 wm-bot2: Draining 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:56 wm-bot2: Safe rebooting 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:55 wm-bot2: Draining 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:55 wm-bot2: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:54 wm-bot2: Safe reboot of 'cloudvirt1030.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:54 wm-bot2: Unset cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:53 wm-bot2: Set cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:52 wm-bot2: Draining 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:52 wm-bot2: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:51 wm-bot2: Drained 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:44 wm-bot2: Safe reboot of 'cloudvirt1029.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:44 wm-bot2: Unset cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:42 wm-bot2: Drained 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:36 wm-bot2: Safe reboot of 'cloudvirt-wdqs1001.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:36 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1001.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:33 wm-bot2: Drained 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:33 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1001.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:32 wm-bot2: Draining 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:32 wm-bot2: Safe rebooting 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:32 wm-bot2: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:31 wm-bot2: Safe reboot of 'cloudvirt-wdqs1002.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:31 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1002.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:31 wm-bot2: Draining 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:31 wm-bot2: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:30 wm-bot2: Set cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:29 wm-bot2: Draining 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:29 wm-bot2: Safe rebooting 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:29 wm-bot2: Drained 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:28 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1002.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:28 wm-bot2: Draining 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:28 wm-bot2: Safe rebooting 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:22 wm-bot2: Safe reboot of 'cloudvirt1026.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:22 wm-bot2: Unset cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:21 wm-bot2: Safe reboot of 'cloudvirt-wdqs1003.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:21 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1003.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:18 wm-bot2: Drained 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:18 wm-bot2: Drained 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:18 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1003.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:17 wm-bot2: Draining 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:17 wm-bot2: Safe rebooting 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:17 wm-bot2: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:16 wm-bot2: Draining 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:16 wm-bot2: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:14 wm-bot2: Safe reboot of 'cloudvirt1025.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:14 wm-bot2: Unset cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:11 wm-bot2: Safe reboot of 'cloudvirt1046.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 21:11 wm-bot2: Unset cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:10 wm-bot2: Drained 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:08 wm-bot2: Drained 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:08 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:07 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:07 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:05 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:04 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:04 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:00 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:59 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:59 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:59 wm-bot2: Safe reboot of 'cloudvirt1047.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 20:59 wm-bot2: Unset cloudvirt 'cloudvirt1047.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:57 wm-bot2: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:57 wm-bot2: Draining 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:57 wm-bot2: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:55 wm-bot2: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:55 wm-bot2: Drained 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:54 wm-bot2: Draining 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:54 wm-bot2: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:54 wm-bot2: Safe reboot of 'cloudvirt1024.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 20:54 wm-bot2: Unset cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:53 wm-bot2: Set cloudvirt 'cloudvirt1047.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:52 wm-bot2: Draining 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:52 wm-bot2: Safe rebooting 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:50 wm-bot2: Drained 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:49 wm-bot2: Set cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:49 wm-bot2: Draining 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:49 wm-bot2: Safe rebooting 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:48 wm-bot2: Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 20:48 wm-bot2: Unset cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:44 wm-bot2: Drained 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:44 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:43 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:43 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Draining 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:34 wm-bot2: Safe rebooting 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:31 wm-bot2: Safe reboot of 'cloudvirt1027.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 20:31 wm-bot2: Unset cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:28 wm-bot2: Drained 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:28 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:27 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:27 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:23 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:22 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:22 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:11 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:10 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:10 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:07 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:07 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:06 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:06 wm-bot2: Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 20:05 wm-bot2: Unset cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:02 wm-bot2: Drained 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:01 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:00 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:00 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:58 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:57 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:57 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:36 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:35 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:35 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 15:06 andrewbogott: stopping nfs-server on labstore1004 in preparation for reboot | |||
* 04:12 andrewbogott: rebooting primary bastion (bastion-eqiad1-03.bastion.eqiad1.wikimedia.cloud) in hopes of resolving a problem with ssh proxying | |||
=== 2022-05-11 === | |||
* 18:48 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 18:48 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 18:48 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 18:39 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 18:38 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 18:38 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 18:04 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 18:03 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 18:03 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 08:56 wm-bot2: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 08:52 wm-bot2: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 07:53 dcaro: test | |||
* 04:28 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 04:27 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 04:27 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:44 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:43 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:43 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:42 wm-bot2: Safe reboot of 'cloudvirt1021.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 03:42 wm-bot2: Unset cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:39 wm-bot2: Drained 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:23 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:22 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:22 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:09 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:08 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:08 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:04 andrewbogott: reset and recreated the rabbitmq cluster in eqiad1 to get around some broken queues. | |||
* 03:02 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 03:01 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 03:01 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:28 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 02:25 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 02:25 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
=== 2022-05-10 === | |||
* 21:43 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:40 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:40 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:35 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 21:32 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 21:32 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:05 wm-bot: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 20:02 wm-bot: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:01 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:00 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 20:00 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:57 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:57 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:55 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:55 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:47 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:46 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:46 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:45 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:45 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:44 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:44 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:40 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:39 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:39 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:37 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:36 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:36 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:33 wm-bot: Safe reboot of 'cloudvirt1017.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster | |||
* 19:33 wm-bot: Unset cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:29 wm-bot: Drained 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:06 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 19:05 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 19:05 wm-bot: Safe rebooting 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 15:41 andrewbogott: rebooting cloud*-dev for [[phab:T307668|T307668]] | |||
* 13:59 taavi: manually attached [[User:Dreamy Jazz]] to wikitech for a password reset (https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Manually_associate_an_LDAP_account_with_wikitech) | |||
=== 2022-05-07 === | |||
* 01:33 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:32 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 01:30 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
* 01:21 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 01:18 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster | |||
=== 2022-05-03 === | |||
* 20:38 andrewbogott: upgrading clouddb2001-dev in place | |||
* 18:18 taavi: updated 'puppet-enc' endpoints on the keystone catalog to use https and port 443 | |||
=== 2022-05-02 === | |||
* 16:56 dcaro: rebooting cloudmetrics1001 | |||
=== 2022-04-29 === | |||
* 14:22 andrewbogott: changing login.toolforge.org, bastion.toolforge.org, and dev.toolforge.org dns entries to refer to the new Buster bastions [[phab:T277653|T277653]] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timeline | |||
=== 2022-04-27 === | |||
* 14:51 wm-bot: Finished rebooting the nodes ['cloudcephosd1001', 'cloudcephosd1002', 'cloudcephosd1003', 'cloudcephosd1004', 'cloudcephosd1005', 'cloudcephosd1006', 'cloudcephosd1007', 'cloudcephosd1008', 'cloudcephosd1009', 'cloudcephosd1010', 'cloudcephosd1011', 'cloudcephosd1012', 'cloudcephosd1013', 'cloudcephosd1014', 'cloudcephosd1015', 'cloudcephosd1016', 'cloudcephosd1017', 'cloudcephosd1018', 'cloudcephosd1019', 'cloudcephosd1020', 'cloud | |||
* 14:50 wm-bot: Finished rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:46 wm-bot: Rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:46 wm-bot: Finished rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:41 wm-bot: Rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:41 wm-bot: Finished rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:35 wm-bot: Rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:35 wm-bot: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:31 wm-bot: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:31 wm-bot: Finished rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:27 wm-bot: Rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:27 wm-bot: Finished rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:23 wm-bot: Rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:23 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:13 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:13 wm-bot: Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:09 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:09 wm-bot: Finished rebooting node cloudcephosd1016.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:05 wm-bot: Rebooting node cloudcephosd1016.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:05 wm-bot: Finished rebooting node cloudcephosd1015.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:01 wm-bot: Rebooting node cloudcephosd1015.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:01 wm-bot: Finished rebooting node cloudcephosd1014.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:57 wm-bot: Rebooting node cloudcephosd1014.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:57 wm-bot: Finished rebooting node cloudcephosd1013.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:44 wm-bot: Rebooting node cloudcephosd1013.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:43 wm-bot: Finished rebooting node cloudcephosd1012.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:39 wm-bot: Rebooting node cloudcephosd1012.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:39 wm-bot: Finished rebooting node cloudcephosd1011.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:35 wm-bot: Rebooting node cloudcephosd1011.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:35 wm-bot: Finished rebooting node cloudcephosd1010.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:31 wm-bot: Rebooting node cloudcephosd1010.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:31 wm-bot: Finished rebooting node cloudcephosd1009.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:26 wm-bot: Rebooting node cloudcephosd1009.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:26 wm-bot: Finished rebooting node cloudcephosd1008.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:14 wm-bot: Rebooting node cloudcephosd1008.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:14 wm-bot: Finished rebooting node cloudcephosd1007.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:10 wm-bot: Rebooting node cloudcephosd1007.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:10 wm-bot: Finished rebooting node cloudcephosd1006.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:05 wm-bot: Rebooting node cloudcephosd1006.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:05 wm-bot: Finished rebooting node cloudcephosd1005.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:01 wm-bot: Rebooting node cloudcephosd1005.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:01 wm-bot: Finished rebooting node cloudcephosd1004.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:57 wm-bot: Rebooting node cloudcephosd1004.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:57 wm-bot: Finished rebooting node cloudcephosd1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:53 wm-bot: Rebooting node cloudcephosd1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:53 wm-bot: Finished rebooting node cloudcephosd1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:50 wm-bot: Rebooting node cloudcephosd1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:50 wm-bot: Finished rebooting node cloudcephosd1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:46 wm-bot: Rebooting node cloudcephosd1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:46 wm-bot: Rebooting the nodes cloudcephosd1001,cloudcephosd1002,cloudcephosd1003,cloudcephosd1004,cloudcephosd1005,cloudcephosd1006,cloudcephosd1007,cloudcephosd1008,cloudcephosd1009,cloudcephosd1010,cloudcephosd1011,cloudcephosd1012,cloudcephosd1013,cloudcephosd1014,cloudcephosd1015,cloudcephosd1016,cloudcephosd1017,cloudcephosd1018,cloudcephosd1019,cloudcephosd1020,cloudcephosd1021,cloudcephosd1022,cloudcephosd1023,cloudcephosd1024 - cookbo | |||
* 12:15 wm-bot: Finished rebooting the nodes ['cloudcephmon1001', 'cloudcephmon1002', 'cloudcephmon1003'] - cookbook ran by dcaro@vulcanus | |||
* 12:15 wm-bot: Finished rebooting node cloudcephmon1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:12 wm-bot: Rebooting node cloudcephmon1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:12 wm-bot: Finished rebooting node cloudcephmon1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:09 wm-bot: Rebooting node cloudcephmon1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:09 wm-bot: Finished rebooting node cloudcephmon1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:07 wm-bot: Rebooting node cloudcephmon1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:07 wm-bot: Rebooting the nodes cloudcephmon1001,cloudcephmon1002,cloudcephmon1003 - cookbook ran by dcaro@vulcanus | |||
* 12:05 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] - cookbook ran by dcaro@vulcanus | |||
* 12:05 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:02 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 12:02 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:59 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:59 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:56 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:56 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus | |||
* 11:55 wm-bot: Finished rebooting the nodes ['cloudcephmon2004-dev', 'cloudcephmon2005-dev', 'cloudcephmon2006-dev'] - cookbook ran by dcaro@vulcanus | |||
* 11:55 wm-bot: Finished rebooting node cloudcephmon2006-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:52 wm-bot: Rebooting node cloudcephmon2006-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:52 wm-bot: Finished rebooting node cloudcephmon2005-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:47 wm-bot: Rebooting node cloudcephmon2005-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:47 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:43 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 11:43 wm-bot: Rebooting the nodes cloudcephmon2004-dev,cloudcephmon2005-dev,cloudcephmon2006-dev - cookbook ran by dcaro@vulcanus | |||
=== 2022-04-26 === | |||
* 10:36 taavi: [codfw1dev] updated designate pool to 2004/2005-dev according to the instructions on https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS/Designate#Initial_designate/pdns_node_setup | |||
=== 2022-04-22 === | |||
* 10:33 taavi: [codfw1dev] restart designate-sink on both new cloudservices host to fix rabbitmq connectivity | |||
=== 2022-04-21 === | |||
* 05:38 andrewbogott: replaced cloudservices200[2,3] with cloudservices200[4,5] | |||
=== 2022-04-19 === | |||
* 15:29 andrewbogott: stopping all VMs on cloudvirt1019, reimaging host | |||
=== 2022-04-18 === | |||
* 15:23 andrewbogott: reimaging cloudvirt1020, leaving VMs in place | |||
* 13:40 andrewbogott: shutting down many codfdfw1dev servers (including network infra!) for [[phab:T305469|T305469]] | |||
=== 2022-04-14 === | |||
* 20:14 andrewbogott: restarting nova-api and nova-conductor services in a superstitious attempt to reduce open DB connections | |||
=== 2022-04-13 === | |||
* 22:01 andrewbogott: restarting galera on cloudcontrols (one by one) to clear open connections | |||
=== 2022-04-11 === | |||
* 15:59 taavi: created cloudinfra.wmcloud.org zone | |||
=== 2022-04-09 === | |||
* 19:55 andrewbogott: reimaging cloudbackup1001-dev to bullseye | |||
* 19:37 taavi: add 'puppet-enc' service & endpoint to keystone [[phab:T274666|T274666]] | |||
* 19:25 andrewbogott: reimaging cloudbackup1002-dev to bullseye | |||
=== 2022-04-07 === | |||
* 12:51 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. ([[phab:T305631|T305631]]) - cookbook ran by arturo@nostromo | |||
=== 2022-04-06 === | |||
* 09:12 arturo: [codf1dev] installing python3-eventlet 0.30.2-5~bpo11+1 on all required servers (cloudvirt, cloudnet, cloudcontrol) ([[phab:T305157|T305157]]) | |||
* 08:45 arturo: [codfw1dev] trying with python3-eventlet 0.30.2-5 installed by hand on cloudvirt2003-dev ([[phab:T305157|T305157]]) | |||
* 08:42 arturo: [codfw1dev] trying with python3-eventlet 0.30.2-5 installed by hand on cloudcontrol servers ([[phab:T305157|T305157]]) | |||
* 08:24 arturo: [codfw1dev] trying with python3-dnspython 2.2.0-2 installed by hand on cloudvirt2003-dev ([[phab:T305157|T305157]]) | |||
* 08:20 arturo: [codfw1dev] trying with python3-dnspython 2.2.0-2 installed by hand on cloudcontrol servers ([[phab:T305157|T305157]]) | |||
=== 2022-03-30 === | |||
* 11:20 arturo: apply urpf strict filter to eqiad cloud-hosts vlan - [[phab:T285461|T285461]] | |||
=== 2022-03-29 === | |||
* 10:02 dcaro: restarting keystone ([[phab:T304918|T304918]]) | |||
=== 2022-03-23 === | |||
* 22:53 wm-bot: Drained 'cloudvirt1045.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:38 wm-bot: Drained 'cloudvirt1044.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:12 wm-bot: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:12 wm-bot: Draining 'cloudvirt1045.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:08 wm-bot: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:07 wm-bot: Set cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:06 wm-bot: Draining 'cloudvirt1044.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:06 wm-bot: Draining 'cloudvirt1043.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:54 wm-bot: Drained 'cloudvirt1042.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:19 wm-bot: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:19 wm-bot: Draining 'cloudvirt1042.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:12 wm-bot: Drained 'cloudvirt1040.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:12 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:09 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:07 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 21:04 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:55 wm-bot: Set cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:54 wm-bot: Draining 'cloudvirt1041.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:30 wm-bot: Drained 'cloudvirt1039.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:15 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:15 wm-bot: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:14 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:14 wm-bot: Draining 'cloudvirt1039.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:44 wm-bot: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:43 wm-bot: Draining 'cloudvirt1038.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:19 wm-bot: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:18 wm-bot: Draining 'cloudvirt1037.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:13 wm-bot: Drained 'cloudvirt1036.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:02 wm-bot2: Testing wm-bot relay to #wikimedia-cloud-feed | |||
* 17:55 wm-bot: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:54 wm-bot: Draining 'cloudvirt1036.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:04 wm-bot: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:03 wm-bot: Draining 'cloudvirt1035.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:03 wm-bot: Drained 'cloudvirt1034.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:51 wm-bot: Drained 'cloudvirt1033.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:37 wm-bot: Set cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:37 wm-bot: Set cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:36 wm-bot: Draining 'cloudvirt1034.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:36 wm-bot: Draining 'cloudvirt1033.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 15:01 wm-bot: Drained 'cloudvirt1032.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 15:00 wm-bot: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:57 wm-bot: Draining 'cloudvirt1032.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:44 wm-bot: Drained 'cloudvirt1031.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:35 wm-bot: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:34 wm-bot: Draining 'cloudvirt1032.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:32 wm-bot: Drained 'cloudvirt1030.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:20 wm-bot: Set cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:19 wm-bot: Draining 'cloudvirt1031.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:18 wm-bot: Set cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 14:17 wm-bot: Draining 'cloudvirt1030.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 13:54 taavi: restart nova-fullstack on cloudcontrol1003 to pick up bastion ip change | |||
* 13:43 wm-bot: Drained 'cloudvirt1029.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 13:23 wm-bot: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 13:22 wm-bot: Draining 'cloudvirt1029.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
=== 2022-03-22 === | |||
* 22:59 wm-bot: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 22:58 wm-bot: Draining 'cloudvirt1027.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
=== 2022-03-17 === | |||
* 01:09 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 00:53 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 00:52 wm-bot: Setting cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 00:52 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
=== 2022-03-15 === | |||
* 20:58 wm-bot: Drained 'cloudvirt1026.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:36 wm-bot: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:36 wm-bot: Setting cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:36 wm-bot: Draining 'cloudvirt1026.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 13:14 wm-bot: Unset cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo | |||
* 13:14 wm-bot: Unsetting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo | |||
* 10:32 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo | |||
* 10:30 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo | |||
=== 2022-03-14 === | |||
* 21:24 wm-bot: Drained 'cloudvirt1025.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:59 wm-bot: Set cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:58 wm-bot: Setting cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:58 wm-bot: Draining 'cloudvirt1025.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:15 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:15 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 20:02 wm-bot: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:59 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:59 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:16 wm-bot: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:15 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:15 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 19:13 wm-bot: Drained 'cloudvirt1023.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:56 wm-bot: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:55 wm-bot: Setting cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:55 wm-bot: Draining 'cloudvirt1023.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:53 wm-bot: Drained 'cloudvirt1022.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:52 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:51 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:51 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:50 wm-bot: Drained 'cloudvirt1021.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:48 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:48 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:48 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 11:48 dcaro: rebased cookbooks on latest master, make sure you pull before sending new patches | |||
=== 2022-03-08 === | |||
* 18:29 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:29 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:29 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:23 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:21 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:21 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:18 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:17 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 18:17 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:28 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:27 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:27 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:18 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:15 wm-bot: Setting cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 17:15 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:48 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:47 wm-bot: Setting cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:47 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:36 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:08 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:07 wm-bot: Setting cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 16:07 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. ([[phab:T281276|T281276]]) - cookbook ran by andrew@buster | |||
* 13:11 arturo: [codfw1dev] rebooting cloudservices servers for [[phab:T303179|T303179]] | |||
* 13:07 arturo: [codfw1dev] rebooting cloudvirt servers for [[phab:T303179|T303179]] | |||
* 13:06 arturo: [codfw1dev] rebooting cloudnet servers for [[phab:T303179|T303179]] | |||
* 12:55 arturo: [codfw1dev] rebooting cloudcontrol servers for [[phab:T303179|T303179]] | |||
=== 2022-03-03 === | |||
* 08:49 taavi: deploying cloudmetrics grafana to grafana 8, [[phab:T282863|T282863]] | |||
=== 2022-03-02 === | |||
* 09:06 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/701347 | |||
=== 2022-02-28 === | |||
* 15:30 dcaro: cleaning up leftover snapshots from failed backups of the maps volume ([[phab:T302720|T302720]]) | |||
=== 2022-02-24 === | |||
* 17:04 andrewbogott: upgrading eqiad1 and codfw1dev to mariadb 10.5.15+maria~bullseye via 'apt-get install libmariadb3:amd64 galera-4 mariadb-server' | |||
* 15:42 dcaro: stopping and starting mariadb on cloudcontrol1003 ([[phab:T302146|T302146]]) | |||
* 10:37 arturo: [codfw1dev] briefly installed galera-4 (26.4.11+1bullseye) over (26.4.9-0+deb11u1) on cloudcontrol2001-dev and then downgrade again to verify package install ([[phab:T302482|T302482]]) | |||
=== 2022-02-23 === | |||
* 20:39 taavi: added domain-wide 'designateadmin' and 'observer' roles to project-proxy-dns-manager service account [[phab:T295246|T295246]] | |||
* 17:40 andrewbogott: restarting lots of openstack services to try to clear up the mess that is [[phab:T236101|T236101]] | |||
* 12:13 arturo: cleaning up cinder volume snapshots, aborrero@cloudcontrol1005:~$ for i in $(sudo wmcs-openstack volume snapshot list -f value -c ID) ; do sudo wmcs-openstack volume snapshot delete $i ; done ([[phab:T302382|T302382]]) | |||
* 10:14 arturo: cleaning up neutron agents for non-existent servers cloudvirt100[1-9].eqiad.wmnet,cloudvirt10[12-15].eqiad.wmnet | |||
* 10:05 dcaro: Deleting stuck novafullstack servers, to let the service create new ones ([[phab:T302369|T302369]]) | |||
* 09:56 arturo: neutron agent-delete bad663b3-fd25-4393-a546-{{Gerrit|4b1b4bdec4db}} (Linux bridge agent {{!}} cloudvirtan1001) | |||
* 09:56 arturo: neutron agent-delete 1071c198-ed57-4b5a-9439-{{Gerrit|30e66a31aa69}} (Linux bridge agent {{!}} cloudvirtan1005) | |||
* 09:55 arturo: neutron agent-delete 2eeef198-8af7-4e5d-bd73-{{Gerrit|e14a2a8d2404}} (Linux bridge agent {{!}} cloudvirtan1004) | |||
* 09:55 arturo: neutron agent-delete afe173eb-35ba-444a-9960-{{Gerrit|899629786d2f}} (Linux bridge agent {{!}} cloudvirtan1003) | |||
* 09:54 arturo: neutron agent-delete afcb9b7f-c1a6-4ff4-9b10-{{Gerrit|92bfbe8d1a56}} (Linux bridge agent {{!}} cloudvirtan1002) | |||
* 09:39 dcaro: restarting neutron-api cloudcontrol1003 to see if the agent status update starts working ([[phab:T302369|T302369]]) | |||
* 09:38 dcaro: restarting neutron-dhcp-agent on cloudnet1003 ([[phab:T302369|T302369]]) | |||
=== 2022-02-22 === | |||
* 22:10 andrewbogott: raising project 'maps' quota by two tb -- [[phab:T300160|T300160]] | |||
* 09:24 arturo: restarting mariadb @ cloudcontrol1003 ([[phab:T302146|T302146]]) | |||
* 09:13 arturo: restarting mariadb @ cloudcontrol1004 ([[phab:T302146|T302146]]) | |||
=== 2022-02-18 === | |||
* 21:57 andrewbogott: leaving cloudcontrol1003 downtimed with disabled puppet for the weekend. Everything there should be stable and fine save rabbit which needs an upgrade. | |||
* 21:30 andrewbogott: rebooting cloudcontrol1003 because rabbit is freaking out | |||
* 17:25 andrewbogott: in-place upgrade of cloudcontrol1004 to bullseye -- [[phab:T281276|T281276]] | |||
* 12:34 arturo: manually install prometheus-openstack-exporter on cloudcontrol1005 ([[phab:T302050|T302050]]) | |||
=== 2022-02-17 === | |||
* 23:02 andrewbogott: in-place upgrade to Bullseye on cloudcontrol1005 [[phab:T281276|T281276]] | |||
=== 2022-02-15 === | |||
* 14:15 taavi: [codfw1dev] added domain-wide 'designateadmin' and 'observer' roles to codfw1dev-proxy-dns-manager service account [[phab:T295246|T295246]] | |||
=== 2022-02-04 === | |||
* 10:12 arturo: restart backup_vms service in cloudvirt1024 ([[phab:T300956|T300956]]) | |||
=== 2022-02-03 === | |||
* 08:21 taavi: cloudmetrics1004: manually added an empty line to /etc/prometheus/blackbox.yml to make /usr/local/bin/blackbox-exporter-assemble happy (clearing "performing a change every puppet run" alert) | |||
=== 2022-02-02 === | |||
* 02:36 andrewbogott: restarting mariadb on cloudcontrol1004 | |||
=== 2022-01-31 === | |||
* 10:15 arturo: cloudcontrol1005:~$ sudo systemctl restart backup_glance_images.service (failed state, no logs, icinga alert) | |||
=== 2022-01-29 === | |||
* 18:24 taavi: delete 2 puppet prefixes in a weird state [[phab:T299750|T299750]] | |||
=== 2022-01-27 === | |||
* 13:24 arturo: cloudmetrics1004:~ $ sudo systemctl restart wmcs_monitoring_graphite_rsync.service ([[phab:T300138|T300138]]) | |||
=== 2022-01-26 === | |||
* 19:09 andrewbogott: bootstrapping a fresh galera node on cloudcontrol1004 | |||
* 18:57 andrewbogott: restarting mariadb on cloudcontrol1004 | |||
=== 2022-01-25 === | |||
* 10:49 arturo: made cloudmetrics1001/1002 primary/backup respectively ([[phab:T299744|T299744]], [[phab:T297814|T297814]], [[phab:T300011|T300011]]) | |||
=== 2022-01-19 === | |||
* 16:38 andrewbogott: moving all scratch mounts to scratch.svc.cloudinfra-nfs.eqiad1.wikimedia.cloud | |||
=== 2022-01-05 === | |||
* 03:11 andrewbogott: 'cp /etc/apt/sources.list /etc/apt/sources.list.prepuppet' on all VMs. Backing up state before puppetizing sources.list with https://gerrit.wikimedia.org/r/c/operations/puppet/+/751498 | |||
=== 2022-01-04 === | |||
* 12:44 dcaro: increasing the size_limit for labs ldap servers | |||
=== 2021-12-26 === | |||
* 16:55 majavah: run attachLdapUser.php on wikitech for developer account "Karthiksripal" | |||
=== 2021-12-24 === | |||
* 22:51 majavah: ran the wikireplica dns script on s5 [[phab:T298303|T298303]] | |||
=== 2021-12-23 === | |||
* 21:42 majavah: deployed horizon wmf-proxy-dashboard update to fix editing of existing proxies | |||
=== 2021-12-21 === | |||
* 10:39 arturo: dropped egress NAT exceptions for WMF apt repos, [[phab:T298042|T298042]] | |||
=== 2021-12-15 === | |||
* 12:44 dcaro: Downtiming cloudvirt-wdqs1001 as it has no VMs running until disk space is fixed ([[phab:T297454|T297454]]) | |||
=== 2021-12-14 === | |||
* 10:26 dcaro: Moved the nova cache (/var/lib/nova/instances/_base) and the canary image local data (/var/lib/nova/instance/<canary_image_id>) to the root disk on cloudvirt-wdqs1001 to temporary free some space ([[phab:T297454|T297454]]) | |||
=== 2021-12-13 === | |||
* 18:08 wm-bot: Drained 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 17:50 wm-bot: Set cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 17:49 wm-bot: Setting cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 17:49 wm-bot: Draining 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 17:44 wm-bot: Drained 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 17:30 wm-bot: Set cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 17:30 wm-bot: Setting cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 17:30 wm-bot: Draining 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 17:13 wm-bot: Drained 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 16:50 wm-bot: Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 16:47 wm-bot: Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 16:47 wm-bot: Draining 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse | |||
* 16:44 wm-bot: Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
* 16:43 wm-bot: Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse | |||
=== 2021-12-03 === | |||
* 18:56 andrewbogott: maintain-views and maintain-meta-p on clouddb1013-1020 | |||
* 10:49 majavah: deleting dbbackups-dashboard project [[phab:T296992|T296992]] | |||
=== 2021-12-02 === | |||
* 01:17 wm-bot: Drained 'cloudvirt1028.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
* 00:56 wm-bot: Set cloudvirt 'cloudvirt1028.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:56 wm-bot: Setting cloudvirt 'cloudvirt1028.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:56 wm-bot: Draining 'cloudvirt1028.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
* 00:50 wm-bot: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:50 wm-bot: Setting cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:50 wm-bot: Draining 'cloudvirt1026.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
* 00:28 wm-bot: Drained 'cloudvirt1021.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
* 00:03 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:02 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 00:02 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
=== 2021-12-01 === | |||
* 23:59 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster | |||
* 23:59 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. ([[phab:T296790|T296790]]) - cookbook ran by andrew@buster | |||
* 23:54 andrewbogott: *correction* adding spare cloudvirts 1044 and 1045 to the 'ceph' pool in order to make space for future juggling around [[phab:T296790|T296790]] and [[phab:T296792|T296792]] | |||
* 23:53 andrewbogott: adding spare cloudvirts 1044 and 1055 to the 'ceph' pool in order to make space for future juggling around [[phab:T296790|T296790]] and [[phab:T296792|T296792]] | |||
=== 2021-11-28 === | |||
* 17:48 andrewbogott: moved cloudvirt1018 out of the 'localstorage' aggregate and into 'maintenance' for [[phab:T296592|T296592]]. It will need to be moved back after the raid is rebuilt. | |||
=== 2021-11-21 === | |||
* 07:19 dcaro_away: restarting designate-sink with some extra logs in it ([[phab:T296144|T296144]]) | |||
=== 2021-11-17 === | |||
* 15:48 andrewbogott: upgrading mariadb packages on eqiad1 cloudcontrols | |||
* 15:39 andrewbogott: sudo cumin "cloud*" 'apt-get update -y --allow-releaseinfo-change' | |||
* 15:26 andrewbogott: updated mariadb packages on codfw1dev cloudcontrols to 1:10.3.31-0+deb10u1 | |||
=== 2021-11-12 === | |||
* 13:31 arturo: restarting glance-api services to make sure they work with new ceph auth creds ([[phab:T293752|T293752]]) | |||
=== 2021-11-08 === | |||
* 21:50 andrewbogott: returned clouddb pools back to normal after maintain_views run: https://gerrit.wikimedia.org/r/c/operations/puppet/+/737505 [[phab:T216481|T216481]] | |||
* 20:07 andrewbogott: depooling clouddb1013 for maintain_views attempt | |||
* 10:54 arturo: [codfw1dev] create service account `srv-networktests` following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for [[phab:T294955|T294955]] | |||
* 10:34 arturo: create service account `srv-networktests` following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for [[phab:T294955|T294955]] | |||
=== 2021-11-05 === | |||
* 11:18 wm-bot: Added 1 new OSDs ['cloudcephosd1024.eqiad.wmnet'] ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:17 wm-bot: Added OSD cloudcephosd1024.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:15 wm-bot: Finished rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 11:12 wm-bot: Rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 11:12 wm-bot: Adding OSD cloudcephosd1024.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:12 wm-bot: Adding new OSDs ['cloudcephosd1024.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
=== 2021-11-04 === | |||
* 16:39 wm-bot: Added 1 new OSDs ['cloudcephosd1023.eqiad.wmnet'] ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:39 wm-bot: Added OSD cloudcephosd1023.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:37 wm-bot: Finished rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 16:34 wm-bot: Rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 16:33 wm-bot: Adding OSD cloudcephosd1023.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:33 wm-bot: Adding new OSDs ['cloudcephosd1023.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:17 wm-bot: Added 1 new OSDs ['cloudcephosd1022.eqiad.wmnet'] ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:17 wm-bot: Added OSD cloudcephosd1022.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:16 wm-bot: Finished rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 16:13 wm-bot: Rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 16:12 wm-bot: Adding OSD cloudcephosd1022.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:12 wm-bot: Adding new OSDs ['cloudcephosd1022.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:00 wm-bot: Adding OSD cloudcephosd1022.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 16:00 wm-bot: Adding new OSDs ['cloudcephosd1022.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:26 wm-bot: Added 1 new OSDs ['cloudcephosd1021.eqiad.wmnet'] ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:26 wm-bot: Added OSD cloudcephosd1021.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:23 wm-bot: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 11:20 wm-bot: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by arturo@endurance | |||
* 11:19 wm-bot: Adding OSD cloudcephosd1021.eqiad.wmnet... (1/1) ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:19 wm-bot: Adding new OSDs ['cloudcephosd1021.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
* 11:16 wm-bot: Adding new OSDs ['cloudcephosd1021.eqiad.wmnet'] to the cluster ([[phab:T295012|T295012]]) - cookbook ran by arturo@endurance | |||
=== 2021-11-03 === | |||
* 17:22 arturo: [codfw1dev] installing keepalived 2.1.5 from buster-backports on cloudgw2001-dev/2002-dev ([[phab:T294956|T294956]]) | |||
* 11:45 arturo: [codfw1dev] downgrade kernel on cloudgw2001-dev/2002-dev ([[phab:T294853|T294853]], [[phab:T291813|T291813]]) | |||
=== 2021-11-02 === | |||
* 10:54 arturo: rebooting cloudnet1004/1003 for [[phab:T291813|T291813]] | |||
* 10:43 arturo: [codfw1dev] rebooting cloudgw200[12]-dev for [[phab:T291813|T291813]] | |||
=== 2021-10-24 === | |||
* 00:47 andrewbogott: deploying a change so that openstack clients use tls endpoints: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732738 | |||
=== 2021-10-21 === | |||
* 10:19 arturo: drop firewall exception on core routers for wiki replicas legacy setup ([[phab:T293897|T293897]]) | |||
* 10:12 arturo: drop NAT exception for wiki replicas legacy setup ([[phab:T293897|T293897]]) | |||
=== 2021-10-20 === | |||
* 21:06 andrewbogott: creating cloudinfra-nfs project [[phab:T293936|T293936]] | |||
=== 2021-10-18 === | |||
* 19:21 andrewbogott: also ticked the 'admin' box on wikitech for majavah [[phab:T292827|T292827]] | |||
* 18:58 andrewbogott: granting majavah 'admin' role in the 'admin' project and also in the default domain. [[phab:T292827|T292827]] | |||
=== 2021-10-14 === | |||
* 12:28 arturo: [codfw1dev] add DB grants for cloudbackup2002.codfw.wmnet IP address to the cinder DB ([[phab:T292546|T292546]]) | |||
=== 2021-10-13 === | |||
* 10:46 arturo: updating python3-neutron across the fleet ([[phab:T292936|T292936]]) | |||
=== 2021-10-12 === | |||
* 09:06 dcaro: upgrading eqiad cloudnet hosts neutron packages ([[phab:T292936|T292936]]) | |||
* 08:57 dcaro: upgrading codfw cloudnet hosts neutron packages ([[phab:T292936|T292936]]) | |||
=== 2021-10-05 === | |||
* 09:39 arturo: [codfw1dev] cleaning up manila stuff from openstack (db, endpoints, tenant, VMs, and such) [[phab:T291257|T291257]] | |||
=== 2021-09-30 === | |||
* 14:50 andrewbogott: sudo cumin "cloud*" "ps -ef {{!}} grep nslcd && service nslcd restart" and sudo cumin "lab*" "ps -ef {{!}} grep nslcd && service nslcd restart" [[phab:T292202|T292202]] | |||
* 14:43 andrewbogott: ran sudo cumin --force --timeout 500 -o json "A:all" "ps -ef {{!}} grep nslcd && service nslcd restart" to get nslcd happy again [[phab:T292202|T292202]] | |||
=== 2021-09-29 === | |||
* 09:41 arturo: [codfw1dev] cleanup manila shares definitions for a clean start now that the manila-sharecontroller VM is apparently well configured ([[phab:T291257|T291257]]) | |||
=== 2021-09-28 === | |||
* 16:23 bstorm: downtime for clouddb1020 to reduce re-pages in case this goes badly [[phab:T291963|T291963]] | |||
* 16:21 bstorm: powering on clouddb1020 via remote console [[phab:T291963|T291963]] | |||
* 15:58 bstorm: depooled clouddb1020 for repair [[phab:T291961|T291961]] | |||
* 12:40 dcaro: Merged change on sssd for bullseye cloud hosts ([[phab:T291585|T291585]]) | |||
* 11:30 arturo: [codfw1dev] create floating IP 185.15.57.5 for manila-sharecontroller.cloudinfra-codfw1dev.codfw1dev.wmcloud.org ([[phab:T291257|T291257]]) | |||
=== 2021-09-27 === | |||
* 10:07 arturo: cloudcontrol1004 apparently healthy [[phab:T291446|T291446]] | |||
* 09:25 arturo: rebooting cloudcontrol1004 for [[phab:T291446|T291446]] | |||
=== 2021-09-24 === | |||
* 13:02 arturo: [codfw1dev] create VM manila-share-controller-01 on cloudinfra-codfw1dev | |||
* 13:00 arturo: [codfw1dev] rebase labs/private.git on cloudinfra-puppetmaster-01, had merge conflict | |||
=== 2021-09-21 === | |||
* 12:13 arturo: [codfw1dev] trying to create a manila service image ([[phab:T291257|T291257]]) | |||
* 11:45 arturo: [codfw1dev] created rabbitmq user ([[phab:T291257|T291257]]) | |||
* 11:32 arturo: [codfw1dev] populated manila DB & created service endpoints ([[phab:T291257|T291257]]) | |||
* 11:06 arturo: [codfw1dev] give manila user admin role @ manila project ([[phab:T291257|T291257]]) | |||
* 11:06 arturo: [codfw1dev] created manila project ([[phab:T291257|T291257]]) | |||
* 10:57 arturo: [codfw1dev] created manila user @ labtestwikitech ([[phab:T291257|T291257]]) | |||
* 10:49 arturo: [codfw1dev] create manila database on cloudcontrol-dev nodes (galera) [[phab:T291257|T291257]] | |||
=== 2021-09-20 === | |||
* 23:08 bstorm: ran `echo check > /sys/block/md0/md/sync_action` on cloudcontrol1004 to check raid | |||
* 22:48 andrewbogott: stopped puppet & mariadb on cloudcontrol1004; it was flapping | |||
* 22:44 andrewbogott: sudo touch /tmp/galera.disabled on cloudcontrol1004, the service seems troubled there | |||
* 21:57 andrewbogott: moving cloudvirt1043 into the 'nfs' aggregate for [[phab:T291405|T291405]] | |||
=== 2021-09-17 === | |||
* 11:35 arturo: [codfw1dev] install manila on cloudcontrol2001-dev ([[phab:T291257|T291257]]) | |||
=== 2021-09-16 === | |||
* 15:56 bstorm: removing downtime for labstore1005 so we'll know if it has another issue [[phab:T290318|T290318]] | |||
=== 2021-09-09 === | |||
* 22:03 bstorm: restarted the prometheus-mysqld-exporter@s1 service as it was not working [[phab:T290630|T290630]] | |||
* 03:15 bstorm: resetting swap on clouddb1017 [[phab:T290630|T290630]] | |||
* 03:08 andrewbogott: stopping maintain-dbusers on labstore1004 for help diagnosing [[phab:T290630|T290630]] | |||
=== 2021-09-03 === | |||
* 15:34 bstorm: rebooting labstore1005 to disconnect the drives from labstore1004 [[phab:T290318|T290318]] | |||
* 15:24 bstorm: stopping puppet and disabling backup syncs to labstore1005 on cloudbackup2002 [[phab:T290318|T290318]] | |||
* 15:20 bstorm: stopping puppet and disabling backup syncs to labstore1005 on cloudbackup2001 [[phab:T290318|T290318]] | |||
=== 2021-08-30 === | |||
* 16:16 wm-bot: Added 1 new OSDs ['cloudcephosd1018.eqiad.wmnet'] - cookbook ran by andrew@buster | |||
* 16:16 wm-bot: Added OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster | |||
* 16:13 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster | |||
* 16:13 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 16:10 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster | |||
* 16:07 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster | |||
* 16:07 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster | |||
* 16:07 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
=== 2021-08-27 === | |||
* 18:57 andrewbogott: raising toolsbeta ram/core/instances quotas so majavah can experiment with bullseye | |||
=== 2021-08-25 === | |||
* 14:45 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster | |||
* 14:42 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster | |||
* 14:42 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster | |||
* 14:42 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 14:41 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
=== 2021-08-19 === | |||
* 17:39 bstorm: restarting glance image backup to try and clear the page | |||
=== 2021-08-18 === | |||
* 16:21 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster | |||
* 16:21 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster | |||
* 16:21 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 16:17 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 16:16 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 16:15 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 16:13 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster | |||
* 14:47 andrewbogott: adding clouvirt1038 to the ceph aggregate, removing from the maintenance aggregate [[phab:T276922|T276922]] | |||
=== 2021-08-17 === | |||
* 15:11 andrewbogott: rebooting cloudcephosd1008 to force raid rebuild -- [[phab:T287838|T287838]] | |||
=== 2021-08-11 === | |||
* 13:51 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:48 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 13:47 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 13:47 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-08-10 === | |||
* 15:15 andrewbogott: restarting all designate services in eqiad1 | |||
* 15:04 andrewbogott: restarting designate-sink in eqiad1; it's complaining about rabbit but I don't want to restart rabbit yet | |||
=== 2021-08-05 === | |||
* 09:37 dcaro: Taking one osd daemon down ot codfw cluster ([[phab:T288203|T288203]]) | |||
=== 2021-08-04 === | |||
* 19:20 bd808: Running deleteBatch.php on cloudweb2001-dev to remove legacy Heira: pages from labtestwiki | |||
=== 2021-08-03 === | |||
* 17:40 bstorm: rerunning the glance backup script after failure | |||
=== 2021-07-31 === | |||
* 00:10 andrewbogott: "systemctl reset-failed cloud-init.service" on all VMs for [[phab:T287309|T287309]] | |||
* 00:08 andrewbogott: "systemctl reset-failed cloud-final.service" on all VMs for [[phab:T287309|T287309]] | |||
=== 2021-07-27 === | |||
* 21:32 andrewbogott: putting cloudvirt1012 back into service [[phab:T286748|T286748]] | |||
* 20:52 andrewbogott: draining VMs off of cloudvirt1012 so we can replace the battery for [[phab:T286748|T286748]] | |||
* 15:15 andrewbogott: "rm /etc/apt/sources.list.d/openstack-mitaka-jessie.list" cloud-wide | |||
=== 2021-07-23 === | |||
* 15:22 bstorm: update wikireplicas-dns for s7 fix for web replicas | |||
=== 2021-07-20 === | |||
* 17:07 andrewbogott: reloading haproxy on dbproxy1018 for [[phab:T286598|T286598]] | |||
* 15:45 arturo: failback from labstore1006 to labstore1007 (dumps NFS) https://gerrit.wikimedia.org/r/c/operations/puppet/+/705417 | |||
* 00:10 bstorm: restarting nova-api on cloudcontrol1003 to try and recover whatever it's doing with designate_floating_ip_ptr_records_updater | |||
=== 2021-07-19 === | |||
* 22:05 bstorm: set downtime scheduled for tomorrow from 1300 to 1600 UTC for cloudstore1008 and 1009 [[phab:T286599|T286599]] | |||
* 20:40 andrewbogott: reloading haproxy on dbproxy1018 for [[phab:T286598|T286598]] | |||
* 13:50 andrewbogott: upgrading mariadb to 10.3.29 on all cloudcontrols | |||
=== 2021-07-16 === | |||
* 09:55 dcaro: checking HP raid issues on coludvirt1012 ([[phab:T286766|T286766]]) | |||
=== 2021-07-14 === | |||
* 21:08 andrewbogott: restarting lots of openstack services while trying to resolve [[phab:T286675|T286675]] | |||
* 12:17 dcaro: doing ceph outage tests on codfw1 (fyi) | |||
=== 2021-07-13 === | |||
* 10:57 dcaro: enabled autoscaling on codfw1 ceph cluster, setting a minimum of pgs on codfw1dev-compute to 128 | |||
=== 2021-07-02 === | |||
* 10:12 wm-bot: The cluster is not rebalance after adding the new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:12 wm-bot: Added 2 new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:12 wm-bot: Added OSD cloudcephosd1020.eqiad.wmnet... (2/2) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:10 wm-bot: Finished rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 10:07 wm-bot: Rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 10:07 wm-bot: Adding OSD cloudcephosd1020.eqiad.wmnet... (2/2) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:07 wm-bot: Added OSD cloudcephosd1019.eqiad.wmnet... (1/2) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:05 wm-bot: Finished rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 10:02 wm-bot: Rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 10:02 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (1/2) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:01 wm-bot: Adding new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 09:13 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (1/2) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 09:13 wm-bot: Adding new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-07-01 === | |||
* 16:27 bstorm: failed over cloudstore1009 to cloudstore1008 [[phab:T224747|T224747]] | |||
* 16:18 bstorm: downtimed cloudstore1008 and cloudstore1009 to fail over [[phab:T224747|T224747]] | |||
* 14:25 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (2/3) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 14:25 wm-bot: Added OSD cloudcephosd1017.eqiad.wmnet... (1/3) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 14:24 wm-bot: Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:21 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:20 wm-bot: Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 14:20 wm-bot: Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 14:18 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:17 wm-bot: Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 14:17 wm-bot: Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 11:16 wm-bot: Added new OSD node cloudcephosd1016.eqiad.wmnet ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 11:13 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:58 dcaro: rebooting cloudcephosd1016 ([[phab:T285858|T285858]]) | |||
* 10:47 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:44 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:42 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:41 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
* 10:40 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster ([[phab:T285858|T285858]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-06-30 === | |||
* 21:48 bstorm: downtimed space alerts for scratch on cloudstore1008 until after the migration | |||
=== 2021-06-25 === | |||
* 15:28 andrewbogott: restarting openstack services on cloudcontrol1005 | |||
* 09:16 arturo: icinga downtime cloudcontrols for 2h | |||
* 08:20 dcaro: restarting rabbitmq on cloudcontrol100<nowiki>{</nowiki>3,4<nowiki>}</nowiki> | |||
=== 2021-06-21 === | |||
* 13:54 dcaro: puppet fix merged and deployed, servers are back to normal | |||
* 13:20 dcaro: merged broken puppet patch, downtimed all cloudvirts for 2h while fixing (nothing big, just added a bad systemd timer) | |||
=== 2021-06-20 === | |||
* 22:21 andrewbogott: clearing admin-monitoring VMs; puppet has been failing lately due to a full drive on the puppetmaster | |||
=== 2021-06-15 === | |||
* 01:18 bstorm: running a modified version of the prometheus dir size cron in screen [[phab:T284964|T284964]] | |||
=== 2021-06-14 === | |||
* 10:13 dcaro: setting ssd to debug mode on tools-sgeexec-0917 ([[phab:T284130|T284130]]) | |||
=== 2021-06-10 === | |||
* 10:58 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:58 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:55 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:55 wm-bot: Finished rebooting node cloudcephmon2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:52 wm-bot: Rebooting node cloudcephmon2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:52 wm-bot: Finished rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:49 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:49 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:48 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:48 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:45 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:45 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:42 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:42 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:39 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 10:39 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:39 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:38 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:35 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:35 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:32 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:32 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:29 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:29 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:26 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:26 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:24 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 09:24 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-06-09 === | |||
* 17:33 arturo: removed icinga downtime for cloudmetrics1002 -- to see if hardware is healthy ([[phab:T281881|T281881]]) | |||
* 13:30 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:30 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:27 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:27 wm-bot: Finished rebooting node cloudcephmon2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:24 wm-bot: Rebooting node cloudcephmon2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:24 wm-bot: Finished rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:21 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:21 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:01 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 13:01 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 12:53 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 12:53 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-06-08 === | |||
* 23:19 bd808: Downtimed cloudmetrics1002 in icinga until 2021-06-30 23:59:01 ([[phab:T281881|T281881]]) | |||
* 21:08 bstorm: downtiming grafana-labs for maintenance | |||
* 16:28 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:27 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:24 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:24 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:22 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:21 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:18 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:18 wm-bot: Rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 16:17 wm-bot: Rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] ([[phab:T281248|T281248]]) - cookbook ran by dcaro@vulcanus | |||
* 15:03 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:59 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:59 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:57 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:57 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:29 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:23 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
* 14:18 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus | |||
=== 2021-06-07 === | |||
* 14:27 andrewbogott: moving cloudvirt1040 from 'maintenance' aggregate to 'ceph' aggregate [[phab:T281399|T281399]] | |||
=== 2021-06-01 === | |||
* 13:12 dcaro: Changed the ceph osd_memory_target on eqiad pool to 6Gi (we were reaching the limit, swapping at some points) | |||
* 09:57 arturo: fix PTR record for 185.15.56.1 ([[phab:T284025|T284025]]) | |||
* 09:56 arturo: fix PTR record for 185.15.56.1 ([[phab:T248025|T248025]]) | |||
=== 2021-05-27 === | |||
* 14:58 wm-bot: Testing - cookbook ran by dcaro@vulcanus | |||
=== 2021-05-26 === | |||
* 19:10 andrewbogott: reimaging cloudvirt1018 to support local VM storage | |||
* 18:07 andrewbogott: draining cloudvirt1018, converting it to a local-storage host like cloudvirt1019 and 1020 -- [[phab:T283296|T283296]] | |||
* 14:36 dcaro: Enabled syslog logging for osd.55 on eqiad ceph cluster for testing ([[phab:T281247|T281247]]) | |||
* 14:36 dcaro: Enabled syslog logging on codfw ceph cluster (mon/osd/mgr) ([[phab:T281247|T281247]]) | |||
* 11:26 arturo: [codfw1dev] purge old kernel packages in cloudvirt200[12]-dev | |||
* 11:03 arturo: created public flavor `g3.cores16.ram36.disk20` (even though it was requested as private in [[phab:T283293|T283293]], but may be useful for others) | |||
=== 2021-05-25 === | |||
* 16:14 bd808: Closed #wikimedia-cloud-admin on f***node | |||
* 16:11 bd808: Closed #wikimedia-cloud-feed on f***node | |||
* 15:19 dcaro: rebooted cloudvirt1020, starting VMs ([[phab:T275893|T275893]]) | |||
* 15:13 dcaro: rebooting cloudvirt1020 ([[phab:T275893|T275893]]) | |||
* 14:42 dcaro: taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it ([[phab:T275893|T275893]]) | |||
=== 2021-05-24 === | |||
* 22:32 andrewbogott: changing the default ttl for eqiad1.wikimedia.cloud. from 3600 to 60; this should help us avoid madness when re-using hostnames. | |||
* 11:20 arturo: created `g3.cores2.ram80.disk40.private` for the wmf-research-tools project, to allow resizing a 40G disk instance | |||
=== 2021-05-22 === | |||
* 02:14 bstorm: downtiming SMART alerts on dumps server labstore1007 for the weekend because it has been flapping [[phab:T281045|T281045]] | |||
=== 2021-05-13 === | |||
* 21:25 bstorm: converted the maps and scratch volumes on cloudstore1008 (standby) to drbd [[phab:T224747|T224747]] | |||
* 15:45 bstorm: re-running wikireplicas-dns after refactor of config to make sure it doesn't change anything | |||
=== 2021-05-12 === | |||
* 14:23 arturo: [codfw1dev] cleanup old unused agents (bgp, ovs) | |||
* 11:37 arturo: [codfw1dev] replacing cloudnet2003-dev with cloudnet2004-dev ([[phab:T281381|T281381]]) | |||
=== 2021-05-11 === | |||
* 18:00 andrewbogott: adding 'trove' service project in advance of deploying trove in eqiad1 | |||
* 10:22 arturo: rebooted cloudgw1002 (active) thus causing a failover to cloudgw1001 | |||
=== 2021-05-09 === | |||
* 10:53 arturo: icinga-downtime cloudmetrics1002 for 3 months ([[phab:T275605|T275605]]) | |||
=== 2021-05-07 === | |||
* 13:51 andrewbogott: add inherited 'admin' right to novaadmin user throughout eqiad1. I was trying to narrow down the rights here but lack of admin breaks some workflows, e.g. [[phab:T281894|T281894]] and [[phab:T282235|T282235]] | |||
=== 2021-05-06 === | |||
* 15:31 arturo: about to migrating CloudVPS network to the cloudgw architecture [[phab:T270704|T270704]] | |||
* 11:14 dcaro: restarting cinder-volume on the eqiad control nodes to refresh the ceph libraries ([[phab:T282109|T282109]]) | |||
=== 2021-05-05 === | |||
* 16:07 dcaro: disallowing insecure global ids on the eqiad ceph cluster ([[phab:T280641|T280641]]) | |||
* 15:15 wm-bot: Safe reboot of 'cloudvirt1046.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:11 wm-bot: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:11 wm-bot: Safe reboot of 'cloudvirt1045.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:07 wm-bot: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:07 wm-bot: Safe reboot of 'cloudvirt1044.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:03 wm-bot: Safe rebooting 'cloudvirt1044.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:03 wm-bot: Safe reboot of 'cloudvirt1043.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:59 wm-bot: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:59 wm-bot: Safe reboot of 'cloudvirt1042.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:40 wm-bot: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:39 wm-bot: Safe reboot of 'cloudvirt1041.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:14 wm-bot: Safe rebooting 'cloudvirt1041.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:14 wm-bot: Safe reboot of 'cloudvirt1039.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 14:10 wm-bot: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 12:35 wm-bot: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:56 wm-bot: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:56 wm-bot: Safe reboot of 'cloudvirt1037.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:31 wm-bot: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:31 wm-bot: Safe reboot of 'cloudvirt1036.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:08 wm-bot: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 11:08 wm-bot: Safe reboot of 'cloudvirt1035.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 10:39 wm-bot: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 10:39 wm-bot: Safe reboot of 'cloudvirt1034.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 10:13 wm-bot: Safe rebooting 'cloudvirt1034.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 10:13 wm-bot: Safe reboot of 'cloudvirt1033.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:47 wm-bot: Safe rebooting 'cloudvirt1033.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:47 wm-bot: Safe reboot of 'cloudvirt1032.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:21 wm-bot: Safe rebooting 'cloudvirt1032.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:21 wm-bot: Safe reboot of 'cloudvirt1031.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:45 wm-bot: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:45 wm-bot: Safe reboot of 'cloudvirt1030.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:19 wm-bot: Safe rebooting 'cloudvirt1030.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:19 wm-bot: Safe reboot of 'cloudvirt1029.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:02 wm-bot: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-05-04 === | |||
* 16:05 wm-bot: Safe reboot of 'cloudvirt1028.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:45 wm-bot: Safe rebooting 'cloudvirt1028.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:44 wm-bot: Safe reboot of 'cloudvirt1027.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:22 wm-bot: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:19 wm-bot: Safe reboot of 'cloudvirt1026.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:15 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 13:19 dcaro: rebooting cloudmetrics1002, got stuck again ([[phab:T275605|T275605]]) | |||
* 10:04 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:10 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 09:10 wm-bot: Safe reboot of 'cloudvirt1025.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:34 wm-bot: Safe rebooting 'cloudvirt1025.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:20 wm-bot: Safe reboot of 'cloudvirt1024.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 08:03 wm-bot: Safe rebooting 'cloudvirt1024.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
=== 2021-05-03 === | |||
* 23:53 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004 [[phab:T281287|T281287]] | |||
* 23:51 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004 | |||
* 16:34 wm-bot: Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 16:29 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:41 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:41 wm-bot: Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 15:13 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. ([[phab:T280641|T280641]]) - cookbook ran by dcaro@vulcanus | |||
* 10:31 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. ([[phab:T280641|T280641]] - cookbook ran by dcaro@vulcanus) | |||
* 10:23 wm-bot: (from a cookbook) | |||
* 09:12 dcaro: draining and rebooting coludvirt1021 ([[phab:T280641|T280641]]) | |||
* 08:26 dcaro: draining and rebooting coludvirt1018 ([[phab:T280641|T280641]]) | |||
=== 2021-04-30 === | |||
* 11:16 dcaro: draining and rebooting coludvirt1017, last one today ([[phab:T280641|T280641]]) | |||
* 10:37 dcaro: draining coludvirt1016 for reboot ([[phab:T280641|T280641]]) | |||
* 09:48 dcaro: draining coludvirt1013 for reboot ([[phab:T280641|T280641]]) | |||
=== 2021-04-29 === | |||
* 15:11 dcaro: hard rebooting cloudmetrics1002, got hung again ([[phab:T275605|T275605]]) | |||
* 07:53 dcaro: Upgrading ceph libraries on cloudcontrol1005 to octopus ([[phab:T274566|T274566]]) | |||
* 07:51 dcaro: Upgrading ceph libraries on cloudcontrol1003 to octopus ([[phab:T274566|T274566]]) | |||
* 07:50 dcaro: Upgrading ceph libraries on cloudcontrol1004 to octopus ([[phab:T274566|T274566]]) | |||
=== 2021-04-28 === | |||
* 21:11 andrewbogott: cleaning up more references to deleted hypervisors with delete from services where topic='compute' and version != 53; | |||
* 20:48 andrewbogott: cleaning up references to deleted hypervisors with mysql:root@localhost [nova_eqiad1]> delete from compute_nodes where hypervisor_version != '5002000'; | |||
* 19:40 andrewbogott: putting cloudvirt1040 into the maintenance aggregate pending more info about [[phab:T281399|T281399]] | |||
* 18:11 andrewbogott: adding cloudvirt1040, 1041 and 1042 to the 'ceph' host aggregate -- [[phab:T275081|T275081]] | |||
* 11:06 dcaro: All ceph server side upgraded to Octopus! \o/ ([[phab:T280641|T280641]]) | |||
* 10:57 dcaro: Got a PG getting stuck on 'remapping' after the OSD came up, had to unset the norebalance and then set it again to get it unstuck ([[phab:T280641|T280641]]) | |||
* 10:34 dcaro: Slow/blocked opns from cloudcephmon03, "osd_failure(failed timeout osd.32..." (cloudcephosd1005), unset the cluster noout/norebalance and went away in a few secs, setting it again and continuing... ([[phab:T280641|T280641]]) | |||
* 09:03 dcaro: Waiting for slow heartbeats from osd.58(cloudcephosd1002) to recover... ([[phab:T280641|T280641]]) | |||
* 08:59 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58, currently on cloudcephosd1002 ([[phab:T280641|T280641]]) | |||
* 08:58 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58 ([[phab:T280641|T280641]]) | |||
* 08:58 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) ([[phab:T280641|T280641]]) | |||
* 08:21 dcaro: Upgrading all the ceph osds on eqiad ([[phab:T280641|T280641]]) | |||
* 08:21 dcaro: The clock skew seems intermittent, there's another task to follw it [[phab:T275860|T275860]] ([[phab:T280641|T280641]]) | |||
* 08:18 dcaro: All equiad ceph mons and mgrs upgraded ([[phab:T280641|T280641]]) | |||
* 08:18 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, cloudcephmon1001, they are back ([[phab:T280641|T280641]]) | |||
* 08:15 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, it went away, I'm guessing systemd-timesyncd fixed it ([[phab:T280641|T280641]]) | |||
* 08:14 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, looking ([[phab:T280641|T280641]]) | |||
* 07:58 dcaro: Upgrading ceph services on eqiad, starting with mons/managers ([[phab:T280641|T280641]]) | |||
=== 2021-04-27 === | |||
* 14:10 dcaro: codfw.openstack upgraded ceph libraries to 15.2.11 ([[phab:T280641|T280641]]) | |||
* 13:07 dcaro: codfw.openstack cloudvirt2002-dev done, taking cloudvirt2003-dev out to upgrade ceph libraries ([[phab:T280641|T280641]]) | |||
* 13:00 dcaro: codfw.openstack cloudvirt2001-dev back online, taking cloudvirt2002-dev out to upgrade ceph libraries ([[phab:T280641|T280641]]) | |||
* 10:51 dcaro: ceph.eqiad: cinder pool got it's pg_num increased to 1024, re-shuffle started ([[phab:T273783|T273783]]) | |||
* 10:48 dcaro: ceph.eqiad: Tweaked the target_size_ratio of all the pools, enabling autoscaler (it will increase cinder pool only) ([[phab:T273783|T273783]]) | |||
* 09:14 dcaro: manually force stopping the server puppetmaster-01 to unblock migration (in codfw1) | |||
* 09:14 dcaro: manually force stopping the server puppetmaster-01 to unblock migration | |||
* 08:59 dcaro: manually force stopping the server exploding-head on codfw, to try cold migration | |||
* 08:47 dcaro: restarting nova-compute on cloudvirt2001-dev after upgrading ceph libraries to 15.2.11 | |||
=== 2021-04-26 === | |||
* 20:56 andrewbogott: deleting spurious 'codfw1dev' and 'codw1dev-4' regions in the dallas deployment; regions without endpoints break a bunch of things | |||
* 09:45 dcaro: draining cloudvirt2001-dev with the new cookbooks ([[phab:T280641|T280641]]) | |||
=== 2021-04-23 === | |||
* 13:49 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 ([[phab:T280641|T280641]]) | |||
* 11:12 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster ([[phab:T280641|T280641]]) | |||
* 09:32 dcaro: finished upgrade of ceph cluster on codfw1 using exclusively cookbooks ([[phab:T280641|T280641]]) | |||
* 09:17 dcaro: testing the upgrade_osds cookbook on codfw1 ceph cluster ([[phab:T280641|T280641]]) | |||
* 08:17 dcaro: testing the upgrade_mons cookbook on codfw1 ceph cluster ([[phab:T280641|T280641]]) | |||
=== 2021-04-21 === | |||
* 17:59 dcaro: all monitors upgraded on codfw1 with one cookbook `cookbook --verbose -c ~/.config/spicerack/cookbook.yaml wmcs.ceph.upgrade_mons --monitor-node-fqdn cloudcephmon2002-dev.codfw.wmnet` ([[phab:T280641|T280641]]) | |||
* 17:47 dcaro: upgrading monitors and mrg nodes on codfw ceph cluster ([[phab:T280641|T280641]]) | |||
* 13:26 dcaro: testing ceph upgrade cookbook on cloudcephmon2002-dev ([[phab:T280641|T280641]]) | |||
=== 2021-04-20 === | |||
* 20:21 andrewbogott: reboot cloudservices1003 | |||
* 20:13 andrewbogott: reboot cloudservices1004 | |||
=== 2021-04-19 === | |||
* 08:40 dcaro: enabling puppet on labstore1004 after mysql restart ([[phab:T279657|T279657]]) | |||
* 08:09 dcaro: downtiming labstore1004 and stopping puppet for mysql restart ([[phab:T279657|T279657]]) | |||
=== 2021-04-14 === | |||
* 10:48 dcaro: Upgrade of codfw ceph to octopus 15.2.20 done, will run some performance tests now ([[phab:T274566|T274566]]) | |||
* 10:41 dcaro: Upgrade of codfw ceph to octopus 15.2.20, mgrs upgraded, osds next ([[phab:T274566|T274566]]) | |||
* 10:37 dcaro: Upgrade of codfw ceph to octopus 15.2.20, mons upgraded, mgrs next ([[phab:T274566|T274566]]) | |||
* 10:15 dcaro: starting the upgrade of codfw ceph to octopus 15.2.20 ([[phab:T274566|T274566]]) | |||
* 10:07 dcaro: Merged the ceph 15 (Octopus) repo deployment to codfw, only the repo, not the packages ([[phab:T274566|T274566]]) | |||
=== 2021-04-13 === | |||
* 16:42 dcaro: Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ ([[phab:T274573|T274573]]) | |||
* 15:08 dcaro: Activating continuous upmap balancer, keeping a close eye ([[phab:T274573|T274573]]) | |||
* 15:03 dcaro: Executing a second pass, there's still movements to improve the eval of 0.030075 ([[phab:T274573|T274573]]) | |||
* 15:02 dcaro: First pass finished, improved eval to 0.030075 ([[phab:T274573|T274573]]) | |||
* 14:49 dcaro: Running the first_pass balancing plan on ceph eqiad, current eval 0.030622 ([[phab:T274573|T274573]]) | |||
* 14:43 dcaro: enabling ceph upmap pg balancer on equiad ([[phab:T274573|T274573]]) | |||
* 14:36 andrewbogott: upgrading codfw1dev to version Victoria, [[phab:T261137|T261137]] | |||
* 13:11 andrewbogott: upgrading eqiad1 designate to version Victoria, [[phab:T261137|T261137]] | |||
* 10:44 dcaro: enabled ceph upmap balancer on codfw ([[phab:T274573|T274573]],[[phab:T274573|T274573]]) | |||
=== 2021-04-07 === | |||
* 21:33 andrewbogott: upgrading codfw1dev designate to Victoria | |||
=== 2021-04-04 === | |||
* 17:36 andrewbogott: upgrading eqiad1 designate to Ussuri | |||
=== 2021-04-02 === | |||
* 14:12 andrewbogott: upgrading codfw1dev to OpenStack version Ussuri | |||
=== 2021-04-01 === | |||
* 12:15 dcaro: Restoring the 4.9 kernel on cloudcephosd2003-dev and upgrading ([[phab:T274565|T274565]]) | |||
* 10:29 dcaro: Done restoring the 4.9 kernel on cloudcephosd2001-dev and upgrading, requires logging into console to boot from the older kernel before removing the newer one ([[phab:T274565|T274565]]) | |||
* 10:10 dcaro: Restoring the 4.9 kernel on cloudcephosd2001-dev and upgrading ([[phab:T274565|T274565]]) | |||
=== 2021-03-31 === | |||
* 08:47 dcaro: upgrading cinder on codfw cloudcontrol2* nodes ([[phab:T278845|T278845]]) | |||
=== 2021-03-30 === | |||
* 09:53 arturo: rebooting cloudnet1003 to cleanup conntrack table, it wouldn't cleanup by hand ... | |||
=== 2021-03-28 === | |||
* 15:42 andrewbogott: updated debian-10.0-buster base image | |||
=== 2021-03-27 === | |||
* 09:54 arturo: cleanup conntrack table in qrouter nents in cloudnet1003 (backup) | |||
=== 2021-03-25 === | |||
* 19:03 andrewbogott: deleting all unused (per wmcs-imageusage) Jessie base images from Glance | |||
* 17:15 andrewbogott: refreshing puppet compiler facts for tools project | |||
* 10:31 dcaro: kernel upgrade on osds on codfw done, running performance tests ([[phab:T274565|T274565]]) | |||
* 10:24 dcaro: upgrading kernel on cloudcephosd2003-dev and reboot ([[phab:T274565|T274565]]) | |||
* 10:18 dcaro: upgrading kernel on cloudcephosd2002-dev and reboot ([[phab:T274565|T274565]]) | |||
* 10:08 dcaro: upgrading kernel on cloudcephmon2003-dev and reboot ([[phab:T274565|T274565]]) | |||
=== 2021-03-24 === | |||
* 09:19 dcaro: restarted wmcs-backup on cloudvirt1024 as it failed due to an image being removed while running ([[phab:T276892|T276892]]) | |||
=== 2021-03-23 === | |||
* 11:33 arturo: root@cloudcontrol1005:~# wmcs-novastats-dnsleaks --delete | |||
=== 2021-03-22 === | |||
* 10:10 arturo: cleanup conntrack table in standby node: aborrero@cloudnet1003:~ $ sudo ip netns exec qrouter-d93771ba-2711-4f88-804a-{{Gerrit|8df6fd03978a}} conntrack -F | |||
=== 2021-03-19 === | |||
* 17:18 bstorm: running `ALTER TABLE account MODIFY COLUMN type ENUM('user','tool','paws');` against the labsdbaccounts database on m5 [[phab:T276284|T276284]] | |||
* 14:29 andrewbogott: switching admin-monitoring project to use an upstream debian image; I want to see how this affects performance | |||
* 00:30 bstorm: downtimed labstore1004 to check some things in debug mode | |||
=== 2021-03-17 === | |||
* 17:28 bstorm: restarted the backup-glance-images job to clear errors in systemd [[phab:T271782|T271782]] | |||
* 17:16 andrewbogott: set default cinder quota for projects to 80Gb with "update quota_classes set hard_limit=80 where resource='gigabytes';" on database 'cinder' | |||
* 16:58 andrewbogott: disabling all flavors with >20Gb root storage with "update flavors set disabled=1 where root_gb>20;" in nova_eqiad1_api | |||
=== 2021-03-10 === | |||
* 16:51 arturo: rebooting cloudvirt1030 for [[phab:T275753|T275753]] | |||
* 13:14 dcaro: starting manually the canary VM for cloudvirt1029 (nova start 349830f6-3b39-4a8c-ada4-{{Gerrit|a7439f65cffe}}) ([[phab:T275753|T275753]]) | |||
* 12:51 arturo: draining cloudvirt1030 for [[phab:T275753|T275753]] | |||
* 12:47 arturo: rebooting cloudvirt1029 for [[phab:T275753|T275753]] | |||
* 11:56 arturo: [codfw1dev] restart rabbitmq-server in all 3 cloudcontrol servers for [[phab:T276964|T276964]] | |||
* 11:53 arturo: [codfw1dev] restart nova-conductor in all 3 cloudcontrol servers for [[phab:T276964|T276964]] | |||
* 11:31 arturo: draining cloudvirt1029 for [[phab:T275753|T275753]] | |||
* 11:29 arturo: rebooting cloudvirt1013 for [[phab:T275753|T275753]] | |||
* 11:05 arturo: draining cloudvirt1013 for [[phab:T275753|T275753]] | |||
* 11:00 arturo: rebooting cloudvirt1028 for [[phab:T275753|T275753]] | |||
* 10:33 arturo: draining cloudvirt1028 for [[phab:T275753|T275753]] | |||
* 10:29 arturo: rebooting cloudvirt1023 for [[phab:T275753|T275753]] | |||
* 09:37 arturo: draining cloudvirt1023 for [[phab:T275753|T275753]] | |||
* 09:07 arturo: [codfw1dev] reimaging cloudvirt2003-dev ([[phab:T276964|T276964]]) | |||
=== 2021-03-09 === | |||
* 16:27 arturo: rebooting cloudvirt1027 ([[phab:T275753|T275753]]) | |||
* 13:39 arturo: draining cloudvrit1027 for [[phab:T275753|T275753]] | |||
* 13:35 arturo: icinga-downtime cloudvirt1038 for 30 days for [[phab:T276922|T276922]] | |||
* 13:21 arturo: add cloudvirt1039 to the ceph host aggregate (no longer a spare, we have cloudvirt1038 with HW failures) | |||
* 12:52 arturo: cloudvirt1038 hard powerdown / powerup for [[phab:T276922|T276922]] | |||
* 12:33 arturo: rebooting cloudvirt1038 ([[phab:T275753|T275753]]) | |||
* 10:58 arturo: draining cloudvirt1038 ([[phab:T275753|T275753]]) | |||
* 10:54 arturo: rebooting cloudvirt1037 ([[phab:T275753|T275753]]) | |||
* 09:59 arturo: draining cloudvirt1037 ([[phab:T275753|T275753]]) | |||
* 09:12 dcaro: restarted the wmcs-backup service on cloudvirt1024 to retry the backups (failed because a VM was removed in-between, [[phab:T276892|T276892]]) | |||
=== 2021-03-05 === | |||
* 21:40 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1 [[phab:T276018|T276018]] | |||
* 21:21 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1 | |||
* 16:23 arturo: rebooting cloudvirt1036 for [[phab:T275753|T275753]] | |||
* 12:30 arturo: draining cloudvirt1036 for [[phab:T275753|T275753]] | |||
* 12:25 arturo: rebooting cloudvirt1035 for [[phab:T275753|T275753]] | |||
* 10:49 arturo: rebooting cloudvirt1035 for [[phab:T275753|T275753]] | |||
* 10:47 arturo: rebooting cloudvirt1034 for [[phab:T275753|T275753]] | |||
* 10:26 arturo: draining cloudvirt1034 for [[phab:T275753|T275753]] | |||
* 10:25 arturo: rebooting cloudvirt1033 for [[phab:T275753|T275753]] | |||
* 09:18 arturo: draining cloudvirt1033 for [[phab:T275753|T275753]] | |||
=== 2021-03-04 === | |||
* 18:36 andrewbogott: rebooting cloudmetrics1002; the console is hanging | |||
* 16:59 arturo: rebooting cloudvirt1032 for [[phab:T275753|T275753]] | |||
* 16:34 arturo: draining cloudvirt1032 for [[phab:T275753|T275753]] | |||
* 16:33 arturo: rebooting cloudvirt1031 for [[phab:T275753|T275753]] | |||
* 16:11 arturo: draining cloudvirt1031 for [[phab:T275753|T275753]] | |||
* 16:09 arturo: rebooting cloudvirt1026 for [[phab:T275753|T275753]] | |||
* 15:57 arturo: draining cloudvirt1026 for [[phab:T275753|T275753]] | |||
* 15:55 arturo: rebooting cloudvirt1025 for [[phab:T275753|T275753]] | |||
* 15:41 arturo: draining cloudvirt1025 for [[phab:T275753|T275753]] | |||
* 15:12 arturo: rebooting cloudvirt1024 for [[phab:T275753|T275753]] | |||
* 11:29 arturo: draining cloudvirt1024 for [[phab:T275753|T275753]] | |||
* 11:24 dcaro: rebooted cloudvirt1022, re-adding to ceph and removing from maintenance host aggregate for [[phab:T275753|T275753]] | |||
* 11:01 dcaro: rebooting cloudvirt1022 for [[phab:T275753|T275753]] | |||
* 09:12 dcaro: draining cloudvirt1022 for [[phab:T275753|T275753]] | |||
=== 2021-03-03 === | |||
* 17:16 andrewbogott: restarting rabbitmq-server on cloudcontrol1003,1004,1005; trying to explain amqp errors in scheduler logs | |||
* 16:03 dcaro: draining cloudvirt1022 for [[phab:T275753|T275753]] | |||
* 16:03 dcaro: draining cloudvirt1022 for [[phab:T275753|T275753]] | |||
* 16:00 arturo: move cloudvirt1013 into the 'toobusy' host aggregate, it has 221% cpu subscription and 82% MEM subscription | |||
* 15:34 arturo: rebooting cloudvirt1021 for [[phab:T275753|T275753]] | |||
* 14:31 arturo: draining cloudvirt1021 for [[phab:T275753|T275753]] | |||
* 13:59 arturo: rebooting cloudvirt1018 for [[phab:T275753|T275753]] | |||
* 13:28 arturo: draining cloudvirt1018 for [[phab:T275753|T275753]] | |||
* 12:49 arturo: rebooting cloudvirt1017 for [[phab:T275753|T275753]] | |||
* 12:22 arturo: draining cloudvirt1017 for [[phab:T275753|T275753]] | |||
* 12:20 arturo: rebooting cloudvirt1016 for [[phab:T275753|T275753]] | |||
* 12:01 arturo: draining cloudvirt1016 for [[phab:T275753|T275753]] | |||
* 11:59 arturo: cloudvirt1014 now in the ceph host aggregate | |||
* 11:58 arturo: rebooting cloudvirt1014 for [[phab:T275753|T275753]] | |||
* 11:50 arturo: moved cloudvirt1023 away from the maintenance host aggregate, leave it in the ceph aggregate (was in the 2) | |||
* 11:47 arturo: moved cloudvirt1014 to the 'maintenance' host aggregate, drain it for [[phab:T275753|T275753]] | |||
* 10:01 arturo: icinga-downtime cloudnet1003 for 14 days bc potential alerting storm due to firmware issues ([[phab:T271058|T271058]]) | |||
* 10:01 arturo: rebooting again cloudnet1003 (no network failover) ([[phab:T271058|T271058]]) | |||
* 09:59 arturo: update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1003 ([[phab:T271058|T271058]]) | |||
* 09:30 arturo: installing linux kernel 5.10.13-1~bpo10+1 in cloudnet1003 and rebooting it (network failover) ([[phab:T271058|T271058]]) | |||
=== 2021-03-02 === | |||
* 17:16 andrewbogott: rebooting cloudvirt1039 to see if I can trigger [[phab:T276208|T276208]] | |||
* 16:10 arturo: [codfw1dev] restart nova-compute on cloudvirt2002-dev | |||
* 11:59 arturo: moved cloudvirt1012 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for [[phab:T275753|T275753]] | |||
* 11:59 arturo: cloudvirt1023 is affected by [[phab:T276208|T276208]] and cannot be rebooted. Put it back into the ceph hos aggregate | |||
* 10:43 arturo: moved cloudvirt1013 cloudvirt1032 cloudvirt1037 back into the 'ceph' host aggregate | |||
* 10:13 arturo: moved cloudvirt1023 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for [[phab:T275753|T275753]] | |||
=== 2021-03-01 === | |||
* 20:12 andrewbogott: removing novaadmin from all projects save 'admin' for [[phab:T274385|T274385]] | |||
* 19:51 andrewbogott: removing novaobserver from all projects save 'observer' for [[phab:T274385|T274385]] | |||
* 19:50 andrewbogott: adding inherited domain-wide roles to novaadmin and novaobserver as per [[phab:T274385|T274385]] | |||
=== 2021-02-28 === | |||
* 04:54 andrewbogott: restarted redis-server on tools-redis-1003 and tools-redis-1004 in an attempt to reduce replag, no real change detected | |||
=== 2021-02-27 === | |||
* 00:33 andrewbogott: sudo cumin --timeout 500 "A:all and not O<nowiki>{</nowiki>project:clouddb-services<nowiki>}</nowiki>" 'lsb_release -c {{!}} grep -i buster && uname -r {{!}} grep -v 4.19.0-14-amd64 && reboot' | |||
* 00:28 andrewbogott: sudo cumin --timeout 500 "A:all and not O<nowiki>{</nowiki>project:clouddb-services<nowiki>}</nowiki>" 'lsb_release -c {{!}} grep -i buster && uname -r {{!}} grep -v 4.19.0-14-amd64 && echo reboot' | |||
* 00:09 andrewbogott: sudo cumin "A:all and not O<nowiki>{</nowiki>project:clouddb-services<nowiki>}</nowiki>" 'lsb_release -c {{!}} grep -i stretch && uname -r {{!}} grep -v 4.19.0-0.bpo.14-amd64 && reboot' | |||
=== 2021-02-26 === | |||
* 14:58 dcaro: [eqiad] rebooting cloudcephosd1015 (last osd \o/) for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:51 dcaro: [eqiad] rebooting cloudcephosd1014 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:44 dcaro: [eqiad] rebooting cloudcephosd1013 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:38 dcaro: [eqiad] rebooting cloudcephosd1012 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:31 dcaro: [eqiad] rebooting cloudcephosd1011 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:25 dcaro: [eqiad] rebooting cloudcephosd1010 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 14:17 dcaro: [eqiad] rebooting cloudcephosd1009 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 13:54 dcaro: [eqiad] downtimed alert1001 Ceph OSDs down alert until 18:00 GMT+1 as that is not under the host being rebooted ([[phab:T275753|T275753]]) | |||
* 13:51 dcaro: [eqiad] rebooting cloudcephosd1008 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 13:45 dcaro: [eqiad] rebooting cloudcephosd1007 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 13:38 dcaro: [eqiad] rebooting cloudcephosd1006 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 12:07 dcaro: [eqiad] rebooting cloudcephosd1005 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 12:00 arturo: rebooting cloudcontrol1003 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:42 arturo: rebooting cloudcontrol1004 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:41 dcaro: [eqiad] rebooting cloudcephosd1004 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:32 dcaro: [eqiad] rebooting cloudcephosd1003 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:30 arturo: rebooting cloudcontrol1005 for kernel upgrade ([[phab:T2|T2]] | |||
* 11:26 dcaro: [eqiad] rebooting cloudcephosd1002 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:16 dcaro: [eqiad] rebooting cloudcephosd1001 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:11 dcaro: [eqiad] rebooting cloudcephmon1003 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 11:05 dcaro: [eqiad] rebooting cloudcephmon1002 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:59 dcaro: [eqiad] rebooting cloudcephmon1001 for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:45 arturo: rebooting cloudvirt1039 into a new kernel ([[phab:T275753|T275753]]) --- spare | |||
* 10:43 dcaro: [codfw1dev] rebooting cloudcephmon2003-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:38 dcaro: [codfw1dev] rebooting cloudcephmon2002-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:29 dcaro: [codfw1dev] rebooting cloudcephmon2001-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:24 arturo: [codfw1dev] purge old kernel packages on cloudvirt2003-dev to force boot into a new kernel ([[phab:T275753|T275753]]) | |||
* 10:11 arturo: [codfw1dev] manually creating /boot/grub/ on cloudvirt2003-dev to allow update-grub2 to run (so it can reboot into a new kernel) ([[phab:T275753|T275753]]) | |||
* 10:11 dcaro: [codfw1dev] rebooting cloudcephosd2003-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:05 dcaro: [codfw1dev] rebooting cloudcephosd2002-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 10:01 arturo: [codfw1dev] rebooting cloudvirt200X-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:59 arturo: [codfw1dev] rebooting cloudweb2001-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:53 arturo: [codfw1dev] rebooting cloudservices2003-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:51 arturo: [codfw1dev] rebooting cloudservices2002-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:45 arturo: [codfw1dev] rebooting cloudcontrol2004-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:44 arturo: [codfw1dev] rebooting cloudbackup[2001-2002].codfw.wmnet for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:43 dcaro: [codfw1dev] rebooting cloudcephosd2001-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:41 arturo: [codfw1dev] rebooting cloudcontrol2003-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
* 09:33 arturo: [codfw1dev] rebooting cloudcontrol2001-dev for kernel upgrade ([[phab:T275753|T275753]]) | |||
=== 2021-02-25 === | |||
* 14:56 arturo: deployed wmcs-netns-events daemon to all cloudnet servers ([[phab:T275483|T275483]]) | |||
=== 2021-02-24 === | |||
* 11:07 arturo: force-reboot cloudmetrics1002, add icinga downtime for 2 hours. Investigating some server issue | |||
* 00:17 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main stretch image in glance on eqiad1 [[phab:T275430|T275430]] | |||
=== 2021-02-23 === | |||
* 22:43 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main buster image in glance on eqiad1 [[phab:T275430|T275430]] | |||
* 20:36 andrewbogott: adding r/o access to the eqiad1-glance-images ceph pool for the client.eqiad1-compute for [[phab:T275430|T275430]] | |||
* 10:49 arturo: rebooting clounet1004 into new kernel from buster-bpo ([[phab:T271058|T271058]]) | |||
* 10:49 arturo: installing linux-image-amd64 from buster-bpo 5.10.13-1~bpo10+1 in cloudnet1004 ([[phab:T271058|T271058]]) | |||
=== 2021-02-22 === | |||
* 17:15 bstorm: restarting nova-compute on cloudvirt1016 and cloudvirt1036 in case it helps [[phab:T275411|T275411]] | |||
* 15:02 dcaro: Re-uploaded the debian buster 10.0 image from rbd to glance, that worked, re-spawning all the broken instances ([[phab:T275378|T275378]]) | |||
* 11:12 dcaro: Refreshing all the canary instances ([[phab:T275354|T275354]]) | |||
=== 2021-02-18 === | |||
* 14:50 arturo: rebooting cloudnet1004 for [[phab:T271058|T271058]] | |||
* 10:25 dcaro: Rebooting cloudmetrics1001 to apply new kernel ([[phab:T275116|T275116]]) | |||
* 10:16 dcaro: Rebooting cloudmetrics1002 to apply new kernel ([[phab:T275116|T275116]]) | |||
* 10:14 dcaro: Upgrading grafana on cloudmetrics1002 ([[phab:T275116|T275116]]) | |||
* 10:12 dcaro: Upgrading grafana on cloudmetrics1001 ([[phab:T275116|T275116]]) | |||
=== 2021-02-17 === | |||
* 15:58 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/664845 to cloudnet servers ([[phab:T268335|T268335]]) | |||
=== 2021-02-15 === | |||
* 16:25 arturo: [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers ([[phab:T272963|T272963]]) | |||
* 15:45 arturo: [codfw1dev] drop subnet definition for cloud-instances-transport1-b-codfw ([[phab:T272963|T272963]]) | |||
* 15:45 arturo: [codfw1dev] connect virtual router cloudinstances2b-gw to vlan cloud-gw-transport-codfw (185.15.57.10) ([[phab:T272963|T272963]]) | |||
=== 2021-02-11 === | |||
* 12:01 arturo: [codfw1dev] drop instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` (was buster, cannot use it yet) | |||
* 11:59 arturo: [codfw1dev] create instance `tools-codfw1dev-bastion-2` (stretch) in `tools-codfw1dev` to test stuff related to [[phab:T272397|T272397]] | |||
* 11:45 arturo: [codfw1dev] create instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` to test stuff related to [[phab:T272397|T272397]] | |||
* 11:42 arturo: [codfw1dev] drop `tools` project, create `tools-codfw1dev` | |||
* 11:38 arturo: [codfw1dev] drop `coudinfra` project (we are using `cloudinfra-codfw1dev` there) | |||
* 05:37 bstorm: downtimed cloudnet1004 for another week [[phab:T271058|T271058]] | |||
=== 2021-02-09 === | |||
* 15:23 arturo: icinga-downtime for 2h everything *labs *cloud for openstack upgrades | |||
* 11:14 dcaro: Merged the osd scheduler change for all osds, applying on all cloudcephosd* ([[phab:T273791|T273791]]) | |||
=== 2021-02-08 === | |||
* 18:50 bstorm: enabled puppet on cloudvirt1023 for now [[phab:T274144|T274144]] | |||
* 18:44 bstorm: restarted the backup_vms.service on cloudvirt1027 [[phab:T274144|T274144]] | |||
* 17:51 bstorm: deleted project pki [[phab:T273175|T273175]] | |||
=== 2021-02-05 === | |||
* 10:59 arturo: icinga-downtime labstore1004 tools share space check for 1 week ([[phab:T272247|T272247]]) | |||
* 10:21 dcaro: This was affecting maps and several others, maps and project-proxy have been fixed ([[phab:T273956|T273956]]) | |||
* 09:19 dcaro: Some certs around the infra are expired ([[phab:T273956|T273956]]) | |||
=== 2021-02-04 === | |||
* 10:12 dcaro: Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) ([[phab:T273851|T273851]]) | |||
=== 2021-02-03 === | |||
* 09:59 dcaro: Doing a full vm backup on cloudvirt1024 with the new script ([[phab:T260692|T260692]]) | |||
* 01:50 bstorm: icinga-downtime cloudnet1004 for a week [[phab:T271058|T271058]] | |||
=== 2021-02-02 === | |||
* 17:14 dcaro: Changed osd memory limit from 4G to 8G ([[phab:T273649|T273649]]) | |||
* 11:00 arturo: icinga-downtime cloudvirt-wdqs1001 for 1 week ([[phab:T273579|T273579]]) | |||
* 03:12 andrewbogott: running /usr/local/sbin/wmcs-purge-backups and /usr/local/sbin/wmcs-backup-instances on cloudvirt1024 to see why the backup job paged | |||
=== 2021-01-29 === | |||
* 15:36 andrewbogott: disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api | |||
=== 2021-01-28 === | |||
* 19:44 andrewbogott: shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train | |||
=== 2021-01-27 === | |||
* 00:50 bstorm: icinga-downtime cloudnet1004 for a week [[phab:T271058|T271058]] | |||
=== 2021-01-22 === | |||
* 16:44 andrewbogott: upgrading designate on cloudvirt1003/1004 to OpenStack 'train' | |||
* 11:29 dcaro: Doing some tests removed cloudcontrol1003 puppet cert, regenerating... | |||
=== 2021-01-21 === | |||
* 11:35 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 ([[phab:T209082|T209082]]) | |||
* 11:30 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657358 ([[phab:T272486|T272486]], [[phab:T209082|T209082]]) | |||
=== 2021-01-20 === | |||
* 10:49 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 ([[phab:T209082|T209082]]) | |||
* 10:05 dcaro: Everything looks ok, created a new vm with a volume in ceph without issues, and on warnings/errors on ceph status, closing ([[phab:T272303|T272303]]) | |||
* 09:55 dcaro: Eqiad ceph cluster uprgaded, doing sanity checks ([[phab:T272303|T272303]]) | |||
* 09:46 dcaro: 75% of the eqiad cluster upgraded... continuing ([[phab:T272303|T272303]]) | |||
* 09:37 dcaro: 25% of the eqiad cluster upgraded... continuing ([[phab:T272303|T272303]]) | |||
* 09:24 dcaro: Mgr daemons upgraded and running, upgrading osd daemons on servers cloudcephosd1*, this make take a bit longer ([[phab:T272303|T272303]]) | |||
* 09:22 dcaro: Mon daemons upgraded and running, upgrading mgr daemons on servers cloudcephmon1* ([[phab:T272303|T272303]]) | |||
* 09:16 dcaro: Starting eqiad ceph upgrade, upgrading the mon servers cloudcephmon1* ([[phab:T272303|T272303]]) | |||
* 09:01 dcaro: Will start the ceph upgrade in 15 min, no downtime nor performance impact is expected ([[phab:T272303|T272303]]) | |||
=== 2021-01-19 === | |||
* 10:17 arturo: icinga-downtime cloudnet1004 for 1 week ([[phab:T271058|T271058]]) | |||
=== 2021-01-18 === | |||
* 16:00 dcaro: Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine ([[phab:T272303|T272303]]) | |||
* 15:38 dcaro: Upgraded mgr sevices on codfw ceph cluster, starting with osd ones ([[phab:T272303|T272303]]) | |||
* 15:35 dcaro: Upgraded mon sevices on codfw ceph cluster, starting with mgr ones ([[phab:T272303|T272303]]) | |||
* 15:21 dcaro: Starting upgrade of ceph mon nodes on codfw ([[phab:T272303|T272303]]) | |||
* 15:06 dcaro: re-enabling puppet on cloudcephosd2* hosts | |||
* 13:53 dcaro: disabling puppet on cloudcephosd2* to resume perf tests | |||
* 10:50 dcaro: re-enabling puppet on cephcloudosd2* (codfw) | |||
* 10:07 dcaro: disabling puppet on cephcloudosd2* (codfw) to do some performance tests | |||
* 09:00 dcaro: Enabling custom application 'cinder' on pool codfw1dev-cinder to get rid of health warnings | |||
=== 2021-01-17 === | |||
* 16:53 arturo: icinga downtime labstore1004 /srv/tools space check for 3 days ([[phab:T272247|T272247]]) | |||
=== 2021-01-15 === | |||
* 13:41 arturo: icinga downtime labstore1004 maintain-dbuser alert until 2021-01-19 ([[phab:T272125|T272125]]) | |||
* 09:47 arturo: labstore1004 maintain-dbusers affected by [[phab:T272127|T272127]] and [[phab:T272125|T272125]] | |||
* 09:22 arturo: restart maintain-dbusers.service in labstore1004 | |||
* 08:19 dcaro: Merging the patch to disable write caches on ceph osds ([[phab:T271527|T271527]]) | |||
=== 2021-01-13 === | |||
* 17:03 arturo: remove cloudvirt1013 cloudvirt1032 cloudvirt1037 to the 'toobusy' host aggregate to prevent further CPU oversubscribing | |||
* 12:40 arturo: try increasing systemd watchdog timeout for conntrackd in cloudnet1004 ([[phab:T268335|T268335]]) | |||
* 11:45 dcaro: https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 merged and deployed (and tested) ([[phab:T268877|T268877]]) | |||
* 11:40 dcaro: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 that might affect the encapi service (puppet on cloud environment), no downtime expected though ([[phab:T268877|T268877]]) | |||
* 10:56 arturo: trying to cleanup dpkg package mess in cloudnet2002-dev | |||
* 10:02 arturo: prevent floating IP allocation from neutron transport subnet: root@cloudcontrol1005:~# neutron subnet-update --allocation-pool start=185.15.56.244,end=185.15.56.244 cloud-instances-transport1-b-eqiad1 ([[phab:T271867|T271867]]) | |||
=== 2021-01-12 === | |||
* 10:33 arturo: reboot cloudnet1004 | |||
* 10:32 arturo: update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1004 ([[phab:T271058|T271058]]) | |||
=== 2021-01-11 === | |||
* 10:22 arturo: doubling size of conntrack table in cloudnet servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/655407 ([[phab:T271058|T271058]]) | |||
* 10:07 arturo: manually cleanup conntrack table in cloudnet1004 ([[phab:T271058|T271058]]) | |||
* 09:19 dcaro: cleaned up ~1800 snapshots, 109 remaining only, one for each host x image combination (plus some ephemeral ones while doing backups), closing the task ([[phab:T270478|T270478]]) | |||
* 08:39 dcaro: cleaning up dangling snapshots now that we have the new suffixed ones ([[phab:T270478|T270478]]) | |||
=== 2021-01-10 === | |||
* 16:02 andrewbogott: restarting rabbitmq-server on all eqiad1 cloudcontrols | |||
* 15:54 andrewbogott: restating neutron-metadata-agent on cloudnet1004 due to many syslog complaints | |||
=== 2021-01-08 === | |||
* 11:25 arturo: rebooting both cloudnet2002-dev/cloudnet2003-dev to make sure interfaces are set up correctl ([[phab:T271517|T271517]]) | |||
* 11:22 arturo: connecting cloudnet2002-dev cloudnet2003-dev back to vlan 2120 ([[phab:T271517|T271517]]) | |||
* 11:06 arturo: root@cloudcontrol2001-dev:~# openstack router set --external-gateway wan-transport-codfw --fixed-ip subnet=cloud-instances-transport1-b-codfw,ip-address=208.80.153.190 cloudinstances2b-gw ([[phab:T271517|T271517]]) | |||
* 11:02 arturo: root@cloudcontrol2001-dev:~# openstack router set --enable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw ([[phab:T271517|T271517]]) | |||
* 11:01 arturo: enabling neutron hacks in codfw1dev (cloudnet2002-dev, cloudnet2003-dev) ([[phab:T271517|T271517]]) | |||
* 10:55 arturo: aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2107 ([[phab:T271517|T271517]]) | |||
* 10:55 arturo: aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2120 ([[phab:T271517|T271517]]) | |||
* 10:53 arturo: root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 208.80.153.185 --ip-version 4 --network wan-transport-codfw --no-dhcp --subnet-range 208.80.153.184/29 cloud-instances-transport1-b-codfw ([[phab:T271517|T271517]]) | |||
* 10:40 dcaro: Finished tests, brining osd online (od.48) for eqiad ceph cluster ([[phab:T271417|T271417]]) | |||
* 09:59 dcaro: Started performance tests on sdc (od.48) for eqiad ceph cluster ([[phab:T271417|T271417]]) | |||
* 09:41 dcaro: Taking osd.48 from eqiad ceph cluster out to do performance tests ([[phab:T271417|T271417]]) | |||
=== 2021-01-07 === | |||
* 15:19 dcaro: Finished speed tests on cloudcephosd2001-dev, reprovisioning the osd.0 sdc ([[phab:T271417|T271417]]) | |||
* 14:39 dcaro: Starting speed tests on cloudcephosd2001-dev sdc ([[phab:T271417|T271417]]) | |||
* 12:54 dcaro: Taking osd.0 down on codfw ceph cluster to try the disk performance testing process ([[phab:T271417|T271417]]) | |||
* 11:35 arturo: merging dmz_cidr change ([[phab:T209082|T209082]], [[phab:T267779|T267779]]) | |||
=== 2021-01-05 === | |||
* 10:40 dcaro: removing dumps-[1..*] backups from cloudvirt1024 as they are not needed ([[phab:T271094|T271094]]) | |||
=== 2021-01-03 === | |||
* 07:06 dcaro: Got a network hiccup on cloudnet1004, keeping track here [[phab:T271058|T271058]] | |||
=== 2020-12-28 === | |||
* 12:32 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 ([[phab:T260692|T260692]]) | |||
* 12:32 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 ([[phab:T260682|T260682]]) | |||
* 12:23 arturo: icinga downtime cloudvirt1026 disk space check until january 5 ([[phab:T260692|T260692]]) | |||
* 06:15 andrewbogott: restarting designate-central on cloudservices1003/1004. I'm pretty sure they're distressed because of DB lag but it's worth a try | |||
=== 2020-12-23 === | |||
* 15:38 andrewbogott: restarting rabbitmq on cloudcontrol1004; suspected leaks | |||
* 15:33 andrewbogott: restarting each cloudcontrol galera node in turn to see if that quiets down the syncing warnings | |||
* 12:08 arturo: move memory out of the swap in cloudcontrol1004 by disabling/enabling it (1Gb swap was being used) | |||
=== 2020-12-22 === | |||
* 15:30 dcaro: cleaning up 6778 dangling snapshots for glance images in eqiad ([[phab:T270478|T270478]]) | |||
* 13:51 dcaro: merged patch to move wikidumpparse backups to cloudvirt1025 to free space on cloudvirt1026 | |||
=== 2020-12-19 === | |||
* 16:18 dcaro: gzipped a bunch of logs on cloudvirt1004 due to / being out of space | |||
* 00:14 bstorm: truncated /var/log/debug.1 on cloudcontrol1003 which appears to be the exact same content as the user.log files anyway | |||
* 00:10 bstorm: truncated /var/log/daemon.log.1 and the haproxy log | |||
* 00:02 bstorm: truncated /var/log/messages.1 on cloudcontrol1003 | |||
=== 2020-12-18 === | |||
* 23:53 bstorm: truncated haproxy.log.1 on cloudcontrol1003 | |||
* 20:46 andrewbogott: setting pg and pgp number to 4096 for eqiad1-compute as joachim thinks 8192 might be too much [[phab:T270305|T270305]] | |||
* 17:09 dcaro: finished cleaning up the dangling snapshots from cloudvirt1026 ([[phab:T270478|T270478]]) | |||
* 17:08 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1026) ([[phab:T270478|T270478]]) | |||
* 17:06 dcaro: finished cleaning up the dangling snapshots from cloudvirt1025 ([[phab:T270478|T270478]]) | |||
* 17:05 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1025) ([[phab:T270478|T270478]]) | |||
* 17:00 dcaro: finished cleaning up the dangling snapshots from cloudvirt1021 ([[phab:T270478|T270478]]) | |||
* 16:58 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1021) ([[phab:T270478|T270478]]) | |||
* 16:56 dcaro: finished cleaning up the dangling snapshots from cloudvirt1022 ([[phab:T270478|T270478]]) | |||
* 16:55 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1022) ([[phab:T270478|T270478]]) | |||
* 16:54 dcaro: finished cleaning up the dangling snapshots from cloudvirt1023 ([[phab:T270478|T270478]]) | |||
* 16:51 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1023) ([[phab:T270478|T270478]]) | |||
* 16:47 dcaro: finished cleaning up the dangling snapshots from cloudvirt1024, freed ~12% of the capacity ([[phab:T270478|T270478]]) | |||
* 16:21 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1024) ([[phab:T270478|T270478]]) | |||
* 16:13 andrewbogott: setting autoscale to 'off' for both ceph pools (eqiad1-compute and eqiad1-glance-images) because we like how things are set and the autoscaler does not | |||
* 10:33 dcaro: purging rbd snapshots for image fc6fb78b-4515-4dcc-8254-{{Gerrit|591b9fe01762}} ([[phab:T270478|T270478]]) | |||
=== 2020-12-17 === | |||
* 22:17 andrewbogott: correction to above, set the pg and pgp to 1024 for eqiad1-glance-images | |||
* 22:16 andrewbogott: setting pgp number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) [[phab:T270305|T270305]] (same as pg) | |||
* 22:14 andrewbogott: setting pg number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) [[phab:T270305|T270305]] | |||
* 22:10 andrewbogott: setting autoscale to 'warn' for both ceph pools (eqiad1-compute and eqiad1-glance-images) | |||
=== 2020-12-16 === | |||
* 09:31 dcaro: removing invalid backups from cloudvirt1024 (196 in total) ([[phab:T269419|T269419]]) | |||
=== 2020-12-14 === | |||
* 17:42 dcaro: The removal freed ~12GB (still 100% usage :S) ([[phab:T269419|T269419]]) | |||
* 17:36 dcaro: removing invalid backups that have a valid copy ([[phab:T269419|T269419]]) | |||
* 15:43 dcaro: Merging the tagging for vm backups ([[phab:T267195|T267195]]) | |||
* 09:45 arturo: icinga downtime cloudvirt1024 for 6 days ([[phab:T269419|T269419]]) | |||
=== 2020-12-13 === | |||
* 09:11 _dcaro: running backup purge script on cloudvirt1024 ([[phab:T269419|T269419]]) | |||
=== 2020-12-10 === | |||
* 23:36 bstorm: cleaned up the logs for haproxy on cloudcontrol1003 by deleting all the gzipped ones and truncating the .1 file | |||
* 11:56 dcaro: Freed some space on cloudvirt1024 by running the purge script ([[phab:T269419|T269419]]) | |||
* 09:17 dcaro: removing leaked dns record discordwiki.eqiad.wmflabs (clinic duty) | |||
=== 2020-12-08 === | |||
* 18:01 dcaro: Host cloudvirt1030 up and running ([[phab:T216195|T216195]]) | |||
* 15:59 dcaro: Re-imaging host cloudvirt1030 ([[phab:T216195|T216195]]) | |||
* 14:18 dcaro: Host online cloudvirt1029 ([[phab:T216195|T216195]]) | |||
* 14:13 dcaro: Host re-imaged, doing tests cloudvirt1029 ([[phab:T216195|T216195]]) | |||
* 12:14 dcaro: Re-imaging cloudvirt1029 ([[phab:T216195|T216195]]) | |||
=== 2020-12-07 === | |||
* 18:33 andrewbogott: putting cloudvirt1023 back into service [[phab:T269467|T269467]] | |||
* 15:55 andrewbogott: reimaging cloudvirt1028 for [[phab:T216195|T216195]] | |||
* 14:49 dcaro: Re-imaging cloudvirt1027 ([[phab:T216195|T216195]]) | |||
=== 2020-12-05 === | |||
* 00:35 andrewbogott: moving cloudvirt1023 back into maintenance because [[phab:T269467|T269467]] continues to puzzle | |||
=== 2020-12-04 === | |||
* 22:33 andrewbogott: moving cloudvirt1023 back into the ceph aggregate; it doesn't need upgrades after all [[phab:T269467|T269467]] | |||
* 22:24 andrewbogott: moving cloudvirt1023 out of the ceph aggregate and into maintenance for [[phab:T269467|T269467]] | |||
* 21:06 andrewbogott: putting cloudvirt1025 and 1026 back into service because I'm pretty sure they're fixed. [[phab:T269313|T269313]] | |||
* 12:12 arturo: manually running `wmcs-purge-backups` again on cloudvirt1024 ([[phab:T269419|T269419]]) | |||
* 11:25 arturo: icinga downtime cloudvirt1024 for 6 days, to avoid paging noises ([[phab:T269419|T269419]]) | |||
* 11:25 arturo: last log line referencing cloudvirt1024 is a mistake ([[phab:T269313|T269313]]) | |||
* 11:24 arturo: icinga downtime cloudvirt1024 for 6 days, to avoid paging noises ([[phab:T269313|T269313]]) | |||
* 10:28 arturo: manually running `wmcs-purge-backups` on cloudvirt1024 ([[phab:T269419|T269419]]) | |||
* 10:23 arturo: setting expiration to 2020-12-03 to the oldest backy snapshot of every VM in cloudvirt1024 ([[phab:T269419|T269419]]) | |||
* 09:54 arturo: icinga downtime cloudvirt1025 for 6 days ([[phab:T269313|T269313]]) | |||
=== 2020-12-03 === | |||
* 23:21 andrewbogott: removing all osds on cloudcephosd1004 for rebuild, [[phab:T268746|T268746]] | |||
* 21:45 andrewbogott: removing all osds on cloudcephosd1005 for rebuild, [[phab:T268746|T268746]] | |||
* 19:51 andrewbogott: removing all osds on cloudcephosd1006 for rebuild, [[phab:T268746|T268746]] | |||
* 17:01 arturo: icinga downtime cloudvirt1025 for 48h to debug network issue [[phab:T269313|T269313]] | |||
* 16:56 arturo: rebooting cloudvirt1025 to debug network issue [[phab:T269313|T269313]] | |||
* 16:38 dcaro: Rimaging cloudvirt1026 ([[phab:T216195|T216195]]) | |||
* 13:24 andrewbogott: removing all osds on cloudcephosd1008 for rebuild, [[phab:T268746|T268746]] | |||
* 02:55 andrewbogott: removing all osds on cloudcephosd1009 for rebuild, [[phab:T268746|T268746]] | |||
=== 2020-12-02 === | |||
* 20:04 andrewbogott: removing all osds on cloudcephosd1010 for rebuild, [[phab:T268746|T268746]] | |||
* 17:25 arturo: [15:51] failovering neutron virtual router in eqiad1 ([[phab:T268335|T268335]]) | |||
* 15:36 arturo: conntrackd is now up and running in cloudnet1003/1004 nodes ([[phab:T268335|T268335]]) | |||
* 15:33 arturo: [codfw1dev] conntrackd is now up and running in cloudnet200x-dev nodes ([[phab:T268335|T268335]]) | |||
* 15:08 andrewbogott: removing all osds on cloudcephosd1012 for rebuild, [[phab:T268746|T268746]] | |||
* 12:41 arturo: disable puppet in all cloudnet servers to merge conntrackd change [[phab:T268335|T268335]] | |||
* 11:12 dcaro: Reset the properties for the flavor g2.cores8.ram16.disk1120 to correct quotes ([[phab:T269172|T269172]]) | |||
* 09:57 arturo: moved cloudvirts 1030, 1029, 1028, 1027, 1026, 1025 away from the 'standard' host aggregate to 'maintenance' ([[phab:T269172|T269172]]) | |||
=== 2020-12-01 === | |||
* 20:06 andrewbogott: removing all osds on cloudcephosd1014 for rebuild, [[phab:T268746|T268746]] | |||
* 12:04 arturo: restarting neutron l3 agents to pick up config change | |||
* 11:48 arturo: merging change to dmz_dir, detail list of private address https://gerrit.wikimedia.org/r/c/operations/puppet/+/641977 | |||
=== 2020-11-30 === | |||
* 18:12 andrewbogott: removing all osds from cloudcephosd1015 in order to investigate [[phab:T268746|T268746]] | |||
=== 2020-11-29 === | |||
* 17:18 andrewbogott: cleaning up some logfiles in tools-sgecron-01 — drive is full | |||
=== 2020-11-26 === | |||
* 22:58 andrewbogott: deleting /var/log/haproxy logs older than 7 days in cloudcontrol100x. We need log rotation here it seems. | |||
* 15:53 dcaro: Created private flavor g2.cores8.ram16.disk1120 for wikidumpparse ([[phab:T268190|T268190]]) | |||
=== 2020-11-25 === | |||
* 19:35 bstorm: repairing ceph pg `instructing pg 6.91 on osd.117 to repair` | |||
* 09:31 _dcaro: The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy ([[phab:T268722|T268722]]) | |||
* 08:54 _dcaro: Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it ([[phab:T268722|T268722]]) | |||
* 08:45 _dcaro: Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened [[phab:T268722|T268722]]) | |||
* 08:45 _dcaro: Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) | |||
* 08:19 _dcaro: Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) | |||
* 08:16 _dcaro: After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service | |||
* 08:16 _dcaro: After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart | |||
=== 2020-11-22 === | |||
* 17:40 andrewbogott: apt-get upgrade on cloudservices1003/1004 | |||
* 17:32 andrewbogott: upgrading Designate on cloudservices1003/1004 to Stein | |||
=== 2020-11-20 === | |||
* 12:44 arturo: [codfw1dev] install conntrackd in cloudnet2003-dev/cloudnet2002-dev to research l3 agent HA reliability | |||
* 09:26 arturo: incinga downtime labstore1006 RAID checks for 10 days ([[phab:T268281|T268281]]) | |||
=== 2020-11-17 === | |||
* 19:21 andrewbogott: draining cloudvirt1012 to experiment with libvirt/cpu things | |||
=== 2020-11-15 === | |||
* 11:21 arturo: icinga downtime cloudbackup2002 for 48h ([[phab:T267865|T267865]]) | |||
=== 2020-11-10 === | |||
* 16:38 arturo: icinga downtime toolschecker for 2h becasue toolsdb maintenance ([[phab:T266587|T266587]]) | |||
* 11:24 arturo: [codfw1dev] enable puppet in puppetmaster01.cloudinfra-codfw1dev (disabled for unspecified reasons) | |||
=== 2020-11-09 === | |||
* 12:42 arturo: restarted neutron l3 agent in cloudnet1003 bc it still had the old default route ([[phab:T265288|T265288]]) | |||
* 12:41 arturo: `root@cloudcontrol1005:~# neutron subnet-delete dcbb0f98-5e9d-4a93-8dfc-4e3ec3c44dcc` ([[phab:T265288|T265288]]) | |||
* 12:41 arturo: `root@cloudcontrol1005:~# neutron router-gateway-set --fixed-ip subnet_id=7c6bcc12-212f-44c2-9954-{{Gerrit|5c55002ee371}},ip_address=185.15.56.244 cloudinstances2b-gw wan-transport-eqiad` ([[phab:T265288|T265288]]) | |||
* 12:19 arturo: subnet 185.1.5.56.240/29 has id 7c6bcc12-212f-44c2-9954-{{Gerrit|5c55002ee371}} in neutron ([[phab:T265288|T265288]]) | |||
* 12:19 arturo: `root@cloudcontrol1005:~# neutron subnet-create --gateway 185.15.56.241 --name cloud-instances-transport1-b-eqiad1 --ip-version 4 --disable-dhcp wan-transport-eqiad 185.15.56.240/29` ([[phab:T265288|T265288]]) | |||
* 12:15 arturo: icinga-downtime toolschecker for 2h ([[phab:T265288|T265288]]) | |||
=== 2020-11-02 === | |||
* 13:36 arturo: (typo: dcaro) | |||
* 13:35 arturo: added dcar as projectadmin & user ([[phab:T266068|T266068]]) | |||
=== 2020-10-29 === | |||
* 16:57 bstorm: silenced deployment-prep project alerts for 60 days since the downtime expired | |||
* 08:12 arturo: force-powercycling cloudcephosd1006 | |||
=== 2020-10-25 === | |||
* 16:20 andrewbogott: adding cloudvirt1038 to the 'ceph' aggregate and removing from the 'spare' aggregate. We need this space while waiting on network upgrades for empty cloudvirts ([[phab:T216195|T216195]]) | |||
=== 2020-10-23 === | |||
* 11:30 arturo: [codfw1dev] openstack --os-project-id cloudinfra-codfw1dev recordset create --type PTR --record nat.cloudgw.codfw1dev.wikimediacloud.org. --description "created by hand" 0-29.57.15.185.in-addr.arpa. 1.0-29.57.15.185.in-addr.arpa. ([[phab:T261724|T261724]]) | |||
* 10:09 arturo: [codf1dev] doing DNS changes for the cloudgw PoC, including designate and https://gerrit.wikimedia.org/r/c/operations/dns/+/635965 ([[phab:T261724|T261724]]) | |||
=== 2020-10-22 === | |||
* 10:46 arturo: [codfw1dev] rebooting cloudinfra-internal-puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud to try fixing some DNS weirdness | |||
* 09:43 arturo: enabling puppet in cloucontrol1003 (message said "please re-enable after 2020-10-22 06:00UTC") | |||
=== 2020-10-21 === | |||
* 14:36 andrewbogott: running apt-get update && apt-get install -y facter on all cloud-vps instances | |||
* 10:31 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code ([[phab:T261724|T261724]]) | |||
* 08:56 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code ([[phab:T261724|T261724]]) | |||
=== 2020-10-20 === | |||
* 15:47 arturo: changing DNS recursor ACLs (https://gerrit.wikimedia.org/r/c/operations/puppet/+/635314) this can be reverted any time if it causes problems ([[phab:T261724|T261724]]) | |||
* 14:49 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code ([[phab:T261724|T261724]]) | |||
=== 2020-10-19 === | |||
* 01:41 andrewbogott: deleting all Precise base images | |||
* 01:36 andrewbogott: deleting all unused Jessie base images | |||
=== 2020-10-18 === | |||
* 23:26 andrewbogott: deleting all Trusty base images | |||
* 21:50 andrewbogott: migrating all currently used ceph images to rbd | |||
=== 2020-10-16 === | |||
* 09:29 arturo: [codfw1dev] still some DNS weirdness, investigating | |||
* 09:25 arturo: [codfw1dev] hard-rebooting bastion-codfw1dev-02, seems in bad shape, doesn't even wake up in the virsh console | |||
* 09:18 arturo: [codfw1dev] live-hacked cloudservices2002-dev /etc/powerdns/recursor.conf file to include cloud-codfw1dev-floating CIDR (185.15.57.0/29) while https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 is in review, so VMs with a floating IP can query the DNS recursor ([[phab:T261724|T261724]]) | |||
* 09:01 arturo: [codfw1dev] basic network connectivity seems stable after cleaning up everything related to address scopes ([[phab:T261724|T261724]]) | |||
=== 2020-10-15 === | |||
* 15:17 arturo: [codfw1dev] try cleaning up anything related to address scopes in the neutron database ([[phab:T261724|T261724]]) | |||
* 13:56 arturo: [codfw1dev] drop neutron l3 agent hacks in cloudnet2002/2003-dev ([[phab:T261724|T261724]]) | |||
=== 2020-10-13 === | |||
* 17:54 andrewbogott: rebuilding cloudvirt1021 for backy support | |||
* 15:22 andrewbogott: draining cloudvirt1021 so I can rebuild it with backy support | |||
* 14:19 andrewbogott: rebuilding cloudvirt1022 with backy support | |||
* 14:03 andrewbogott: draining cloudvirt1022 so I can rebuild it with backy support | |||
* 11:19 arturo: [codfw1dev] rebooting labtestvirt2003 | |||
=== 2020-10-09 === | |||
* 10:15 arturo: [codfwd1ev] root@cloudcontrol2001-dev:~# openstack router set --disable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw ([[phab:T261724|T261724]]) | |||
* 09:22 arturo: [codfwd1dev] rebooting cloudnet boxes for bridge and vlan changes ([[phab:T261724|T261724]]) | |||
* 09:12 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet delete 31214392-9ca5-4256-bff5-{{Gerrit|1e19a35661de}} (cloud-instances-transport1-b-codfw - 208.80.153.184/29) ([[phab:T261724|T261724]]) | |||
* 09:10 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack router set --external-gateway wan-transport-codfw --fixed-ip subnet=cloud-gw-transport-codfw,ip-address=185.15.57.10 cloudinstances2b-gw ([[phab:T261724|T261724]]) | |||
* 08:49 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.9 --no-dhcp --subnet-range 185.15.57.8/30 cloud-gw-transport-codfw ([[phab:T261724|T261724]]) | |||
* 08:47 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet delete a5ab5362-4ffb-4059-9ff7-{{Gerrit|391e22dcf3bc}} ([[phab:T261724|T261724]]) | |||
=== 2020-10-08 === | |||
* 16:17 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.8 --no-dhcp --subnet-range 185.15.57.8/31 cloud-gw-transport-codfw` (with a hack -- see task) ([[phab:T263622|T263622]]) | |||
* 16:03 arturo: [codfw1dev] briefly live-hacked python3-neutron source code in all 3 cloudcontrol2xxx-dev servers to workaround /31 network definition issue ([[phab:T263622|T263622]]) | |||
* 10:28 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) [[phab:T261724|T261724]] | |||
=== 2020-10-06 === | |||
* 21:30 andrewbogott: moved cloudvirt1013 out of the 'ceph' aggregate and into the 'maintenance' aggregate for [[phab:T243414|T243414]] | |||
* 21:29 andrewbogott: draining cloudvirt1013 for upgrade to 10G networking | |||
* 14:45 arturo: icinga downtime every cloud* lab* host for 60 minutes for keystone maintenance | |||
=== 2020-10-05 === | |||
* 17:40 bd808: `service uwsgi-labspuppetbackend restart` on cloud-puppetmaster-03 ([[phab:T264649|T264649]]) | |||
=== 2020-10-02 === | |||
* 11:05 arturo: [codfw1dev] restarting rabbitmq-server in all 3 control nodes, the l3 agent was misbehaving | |||
* 09:16 arturo: [codfw1dev] trying the labtestvirt2003 (cloudgw) reimage again ([[phab:T261724|T261724]]) | |||
=== 2020-10-01 === | |||
* 16:06 arturo: rebooting cloudvirt1024 to validate changes to /etc/network/interfaces file | |||
* 15:36 arturo: [codfw1dev] reimaging labtestvirt2003 | |||
=== 2020-09-30 === | |||
* 16:47 andrewbogott: rebooting cloudvir1032, 1033, 1034 for [[phab:T262979|T262979]] | |||
* 13:28 arturo: enable puppet, reboot and pool back cloudvirt1031 | |||
* 13:27 arturo: extend icinga downtimes for another 120 mins | |||
* 13:15 arturo: `aborrero@cloudcontrol1003:~$ sudo nova-manage placement sync_aggregates` after reading a hint in nova-api.log | |||
* 13:02 arturo: rebooting cloudvirt1016 and moving it to the ceph host aggregate | |||
* 12:55 arturo: rebooting cloudvirt1014 and moving it to the ceph host aggregate | |||
* 12:51 arturo: rebooting cloudvirt1013 and moving it to the ceph host aggregate | |||
* 12:39 arturo: root@cloudcontrol1005:~# openstack aggregate add host maintenance cloudvirt1031 | |||
* 12:36 arturo: rebooted cloudnet1003 (active) a couple of minutes ago | |||
* 12:36 arturo: move cloudvirt1012 and cloudvirt1039 to the ceph aggregate | |||
* 11:49 arturo: rebooting cloudvirt1039 | |||
* 11:46 arturo: rebooting cloudvirt1012 | |||
* 11:40 arturo: rebooting cloudnet1004 (standby) to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 ([[phab:T262979|T262979]]) | |||
* 11:38 arturo: [codfw1dev] rebooting cloudnet2002-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 | |||
* 11:36 arturo: [codfw1dev] rebooting cloudnet2003-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 | |||
* 11:33 arturo: disabling puppet and downtiming every virt/net server in the fleet in preparation for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 ([[phab:T262979|T262979]]) | |||
* 09:32 arturo: rebooting cloudvirt1012 to investigate linuxbridge agent issues | |||
=== 2020-09-29 === | |||
* 15:40 arturo: downgrade linux kernel from linux-image-4.19.0-11-amd64 to linux-image-4.19.0-10-amd64 on cloudvirt1012 | |||
* 14:47 arturo: rebooting cloudvirt1012, chasing config weirdness in the linuxbridge agent | |||
* 14:05 andrewbogott: reimaging 1014 over and over in an attempt to get partman right | |||
* 13:51 arturo: rebooting cloudvirt1012 | |||
=== 2020-09-28 === | |||
* 14:55 arturo: [jbond42] upgraded facter to v3 across the VM fleet | |||
* 13:54 andrewbogott: moving cloudvirt1035 from aggregate 'spare' to 'ceph'. We're going to need all the capacity we can get while converting older cloudvirts to ceph | |||
=== 2020-09-24 === | |||
* 15:47 arturo: stopping/restarting rabbitmq-server in all cloudcontrol servers | |||
* 15:45 arturo: restarting rabbitmq-server in cloudcontrol103 | |||
* 15:15 arturo: restarting floating_ip_ptr_records_updater.service in all 3 cloudcontrol servers to reset state after a DNS failure | |||
=== 2020-09-18 === | |||
* 10:16 arturo: cloudvirt1039 libvirtd service issues were fixed with a reboot | |||
* 09:56 arturo: rebooting cloudvirt1039 (spare) to try to fix some weird libvirtd failure | |||
* 09:50 arturo: enabling puppet in cloudvirts and effectively merging patches from [[phab:T262979|T262979]] | |||
* 08:59 arturo: disable puppet in all buster cloudvirts (cloudvirt[1024,1031-1039].eqiad.wmnet) to merge a patch for [[phab:T263205|T263205]] and [[phab:T262979|T262979]] | |||
* 08:50 arturo: installing iptables from buster-bpo in cloudvirt1036 ([[phab:T263205|T263205]] and [[phab:T262979|T262979]]) | |||
=== 2020-09-15 === | |||
* 20:32 andrewbogott: rebooting cloudvirt1038 to see if it resolves [[phab:T262979|T262979]] | |||
* 13:58 andrewbogott: draining cloudvirt1002 with wmcs-ceph-migrate | |||
=== 2020-09-14 === | |||
* 14:21 andrewbogott: draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate | |||
* 10:41 arturo: [codfw1dev] trying to get the bonding working for labtestvirt2003 ([[phab:T261724|T261724]]) | |||
* 09:47 arturo: installed qemu security update in eqiad1 cloudvirts ([[phab:T262386|T262386]]) | |||
* 09:43 arturo: [codfw1dev] installed qemu security update in codfw1dev cloudvirts ([[phab:T262386|T262386]]) | |||
=== 2020-09-09 === | |||
* 18:13 andrewbogott: restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms | |||
* 18:01 andrewbogott: restarting ceph-mgr@cloudcephmon1003 in hopes that the slow ops reported are phantoms (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EOWNO3MDYRUZKAK6RMQBQ5WBPQNLHOPV/) | |||
* 17:40 andrewbogott: giving ceph pg autoscale another chance: ceph osd pool set eqiad1-compute pg_autoscale_mode on | |||
* 00:05 bd808: Running wmcs-novastats-dnsleaks ([[phab:T262359|T262359]]) | |||
=== 2020-09-08 === | |||
* 21:48 bd808: Renamed FQDN prefixes to wikimedia.cloud scheme in cloudinfra-db01's labspuppet db ([[phab:T260614|T260614]]) | |||
* 14:29 andrewbogott: restarting nova-compute on all cloudvirts (everyone is upset from the reset switch failure) | |||
* 14:18 arturo: restarting nova-fullstack service in cloudcontrol1003 | |||
* 14:17 andrewbogott: stopping apache2 on labweb1001 to make sure the Horizon outage is total | |||
=== 2020-09-03 === | |||
* 09:31 arturo: icinga downtime cloud* servers for 30 mins ([[phab:T261866|T261866]]) | |||
=== 2020-09-02 === | |||
* 08:46 arturo: [codfw1dev] reimaging spare server labtestvirt2003 as debian buster ([[phab:T261724|T261724]]) | |||
=== 2020-09-01 === | |||
* 18:18 andrewbogott: adding drives on cloudcephosd100[3-5] to ceph osd pool | |||
* 13:40 andrewbogott: adding drives on cloudcephosd101[0-2] to ceph osd pool | |||
* 13:35 andrewbogott: adding drives on cloudcephosd100[1-3] to ceph osd pool | |||
* 11:27 arturo: [codfw1dev] rebooting again cloudnet2002-dev after some network tests, to reset initial state ([[phab:T261724|T261724]]) | |||
* 11:09 arturo: [codfw1dev] rebooting cloudnet2002-dev after some network tests, to reset initial state ([[phab:T261724|T261724]]) | |||
* 10:49 arturo: disable puppet in cloudnet servers to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/623569/ | |||
=== 2020-08-31 === | |||
* 23:26 bd808: Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock | |||
* 11:20 arturo: [codfw1dev] livehacking https://gerrit.wikimedia.org/r/c/operations/puppet/+/615161 in the puppetmasters for tests before merging | |||
=== 2020-08-28 === | |||
* 20:12 bd808: Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003 | |||
=== 2020-08-26 === | |||
* 17:12 bstorm: Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 [[phab:T261336|T261336]] | |||
=== 2020-08-21 === | |||
* 21:34 andrewbogott: restarting nova-compute on cloudvirt1033; it seems stuck | |||
=== 2020-08-19 === | |||
* 14:21 andrewbogott: rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak | |||
=== 2020-08-06 === | |||
* 21:02 andrewbogott: removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts | |||
* 20:06 bstorm: manually stopped the RAID check on cloudcontrol1003 [[phab:T259760|T259760]] | |||
=== 2020-08-04 === | |||
* 18:54 bstorm: restarting mariadb on cloudcontrol1004 to setup parallel replication | |||
=== 2020-08-03 === | |||
* 17:02 bstorm: increased db connection limit to 800 across galera cluster because we were clearly hovering at limit | |||
=== 2020-07-31 === | |||
* 19:28 bd808: wmcs-novastats-dnsleaks --delete (lots of leaked fullstack-monitoring records to clean up) | |||
=== 2020-07-27 === | |||
* 22:17 andrewbogott: ceph osd pool set compute pg_num 2048 | |||
* 22:14 andrewbogott: ceph osd pool set compute pg_autoscale_mode off | |||
=== 2020-07-24 === | |||
* 19:15 andrewbogott: ceph mgr module enable pg_autoscaler | |||
* 19:15 andrewbogott: ceph osd pool set compute pg_autoscale_mode on | |||
=== 2020-07-22 === | |||
* 08:55 jbond42: [codfw1dev] upgrading hiera to version5 | |||
* 08:48 arturo: [codfw1dev] add jbond as user in the bastion-codfw1dev and cloudinfra-codfw1dev projects | |||
* 08:45 arturo: [codfw1dev] enabled account creation in labtestwiki briefly for jbond42 to create an account | |||
=== 2020-07-16 === | |||
* 10:48 arturo: merging change to neutron dmz_cidr https://gerrit.wikimedia.org/r/c/operations/puppet/+/613123 ([[phab:T257534|T257534]]) | |||
=== 2020-07-15 === | |||
* 23:15 bd808: Removed Merlijn van Deen from toollabs-trusted Gerrit group ([[phab:T255697|T255697]]) | |||
* 11:48 arturo: [codfw1dev] created DNS records (A and PTR) for bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org <-> 185.15.57.2 | |||
* 11:41 arturo: [codfw1dev] add myself as projectadmin to the `bastioninfra-codfw1dev` project | |||
* 11:39 arturo: [codfw1dev] created DNS zone `bastioninfra-codfw1dev.codfw1dev.wmcloud.org.` in the cloudinfra-codfw1dev project and then transfer ownership to the bastioninfra-codfw1dev project | |||
=== 2020-07-14 === | |||
* 15:19 arturo: briefly set root@cloudnet1003:~ # sysctl net.ipv4.conf.all.accept_local=1 (in neutron qrouter netns) ([[phab:T257534|T257534]]) | |||
* 10:43 arturo: icinga downtime cloudnet* hosts for 30 mins to introduce new check https://gerrit.wikimedia.org/r/c/operations/puppet/+/612390 ([[phab:T257552|T257552]]) | |||
* 04:01 andrewbogott: added a wildcard *.wmflabs.org domain pointing at the domain proxy in project-proxy | |||
* 04:00 andrewbogott: shortened the ttl on .wmflabs.org. to 300 | |||
=== 2020-07-13 === | |||
* 16:17 arturo: icinga downtime cloudcontrol[1003-1005].wikimedia.org for 1h for galera database movements | |||
=== 2020-07-12 === | |||
* 17:39 andrewbogott: switched eqiad1 keystone from m5 to cloudcontrol galera | |||
=== 2020-07-10 === | |||
* 20:26 andrewbogott: disabling nova api to move database to galera | |||
=== 2020-07-09 === | |||
* 11:23 arturo: [codfw1dev] rebooting cloudnet2003-dev again for testing sysct/puppet behavior ([[phab:T257552|T257552]]) | |||
* 11:11 arturo: [codfw1dev] rebooting cloudnet2003-dev for testing sysct/puppet behavior ([[phab:T257552|T257552]]) | |||
* 09:16 arturo: manually increasing sysctl value of net.nf_conntrack_max in cloudnet servers ([[phab:T257552|T257552]]) | |||
=== 2020-07-06 === | |||
* 15:16 arturo: installing 'aptitude' in all cloudvirts | |||
=== 2020-07-03 === | |||
* 12:51 arturo: [codfw1dev] galera cluster should be up and running, openstack happy ([[phab:T256283|T256283]]) | |||
* 11:44 arturo: [codfw1dev] restoring glance database backup from bacula into cloudcontrol2001-dev ([[phab:T256283|T256283]]) | |||
* 11:39 arturo: [codfw1dev] stopped mysql database in the galera cluster [[phab:T256283|T256283]] | |||
* 11:36 arturo: [codfw1dev] dropped glance database in the galera cluster [[phab:T256283|T256283]] | |||
=== 2020-07-02 === | |||
* 15:41 arturo: `sudo wmcs-openstack --os-compute-api-version 2.55 flavor create --private --vcpus 8 --disk 300 --ram 16384 --property aggregate_instance_extra_specs:ceph=true --description "for packaging envoy" bigdisk-ceph` ([[phab:T256983|T256983]]) | |||
=== 2020-06-29 === | |||
* 14:24 arturo: starting rabbitmq-server in all 3 cloudcontrol servers | |||
* 14:23 arturo: stopping rabbitmq-server in all 3 cloudcontrol servers | |||
=== 2020-06-18 === | |||
* 20:38 andrewbogott: rebooting cloudservices2003-dev due to a mysterious 'host down' alert on a secondary ip | |||
=== 2020-06-16 === | |||
* 15:38 arturo: created by hand neutron port 9c0a9a13-e409-49de-9ba3-{{Gerrit|bc8ec4801dbf}} `paws-haproxy-vip` ([[phab:T295217|T295217]]) | |||
=== 2020-06-12 === | |||
* 13:23 arturo: DNS zone `paws.wmcloud.org` transferred to the PAWS project ([[phab:T195217|T195217]]) | |||
* 13:20 arturo: created DNS zone `paws.wmcloud.org` ([[phab:T195217|T195217]]) | |||
=== 2020-06-11 === | |||
* 19:19 bstorm_: proceeding with failback to labstore1004 now that DRBD devices are consistent [[phab:T224582|T224582]] | |||
* 17:22 bstorm_: delaying failback labstore1004 for drive syncs [[phab:T224582|T224582]] | |||
* 17:17 bstorm_: failing NFS back to labstore1004 to complete the upgrade process [[phab:T224582|T224582]] | |||
* 16:15 bstorm_: failing over NFS for labstore1004 to labstore1005 [[phab:T224582|T224582]] | |||
=== 2020-06-10 === | |||
* 16:09 andrewbogott: deleting all old cloud-ns0.wikimedia.org and cloud-ns1.wikimedia.org ns records in designate database [[phab:T254496|T254496]] | |||
=== 2020-06-09 === | |||
* 15:25 arturo: icinga downtime everything cloud* lab* for 2h more ([[phab:T253780|T253780]]) | |||
* 14:09 andrewbogott: stopping puppet, all designate services and all pdns services on cloudservices1004 for [[phab:T253780|T253780]] | |||
* 14:01 arturo: icinga downtime everything cloud* lab* for 2h ([[phab:T253780|T253780]]) | |||
=== 2020-06-05 === | |||
* 15:08 andrewbogott: trying to re-enable puppet without losing cumin contact, as per https://phabricator.wikimedia.org/T254589 | |||
=== 2020-06-04 === | |||
* 14:24 andrewbogott: disabling puppet on all instances for /labs/private recovery | |||
* 14:23 arturo: disabling puppet on all instances for /labs/private recovery | |||
=== 2020-05-28 === | |||
* 23:02 bd808: `/usr/local/sbin/maintain-dbusers --debug harvest-replicas` ([[phab:T253930|T253930]]) | |||
* 13:36 andrewbogott: rebuilding cloudservices2002-dev with Buster | |||
* 00:33 andrewbogott: shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for [[phab:T253780|T253780]] | |||
=== 2020-05-27 === | |||
* 23:29 andrewbogott: disabling the backup job on cloudbackup2001 (just like last week) so the backup doesn't start while Brooke is rebuilding labstore1004 tomorrow. | |||
* 06:03 bd808: `systemctl start mariadb` on clouddb1001 following reboot (take 2) | |||
* 05:58 bd808: `systemctl start mariadb` on clouddb1001 following reboot | |||
* 05:53 bd808: Hard reboot of clouddb1001 via Horizon. Console unresponsive. | |||
=== 2020-05-25 === | |||
* 16:35 arturo: [codfw1dev] created zone `0-29.57.15.185.in-addr.arpa.` ([[phab:T247972|T247972]]) | |||
=== 2020-05-21 === | |||
* 19:23 andrewbogott: disabling puppet on cloudbackup2001 to prevent the backup job from starting during maintenance | |||
* 19:16 andrewbogott: systemctl disable block_sync-tools-project.service on cloudbackup2001.codfw.wmnet to avoid stepping on current upgrade | |||
* 15:48 andrewbogott: re-imaging cloudnet1003 with Buster | |||
=== 2020-05-19 === | |||
* 22:59 bd808: `apt-get install mariadb-client` on cloudcontrol1003 | |||
* 21:12 bd808: Migrating wcdo.wcdo.eqiad.wmflabs to cloudvirt1023 ([[phab:T251065|T251065]]) | |||
=== 2020-05-18 === | |||
* 21:37 andrewbogott: rebuilding cloudnet2003-dev with Buster | |||
=== 2020-05-15 === | |||
* 22:10 bd808: Added reedy as projectadmin in cloudinfra project ([[phab:T249774|T249774]]) | |||
* 22:05 bd808: Added reedy as projectadmin in admin project ([[phab:T249774|T249774]]) | |||
* 18:44 bstorm_: rebooting cloudvirt-wdqs1003 [[phab:T252831|T252831]] | |||
* 15:47 bd808: Manually running wmcs-novastats-dnsleaks from cloudcontrol1003 ([[phab:T252889|T252889]]) | |||
=== 2020-05-14 === | |||
* 23:28 bstorm_: downtimed cloudvirt1004/6 and cloudvirt-wdqs1003 until tomorrow around this time [[phab:T252831|T252831]] | |||
* 22:21 bstorm_: upgrading qemu-system-x86 on cloudvirt1006 to backports version [[phab:T252831|T252831]] | |||
* 22:15 bstorm_: changing /etc/libvirt/qemu.conf and restarting libvirtd on cloudvirt1006 [[phab:T252831|T252831]] | |||
* 21:12 andrewbogott: rebuilding cloudvirt1003-wdqs as part of [[phab:T252831|T252831]] | |||
* 15:47 andrewbogott: moving cloudvirt1004 and cloudvirt1006 to the 'ceph' aggregate for [[phab:T252784|T252784]] | |||
* 15:02 andrewbogott: moving all of cloudvirt100[1-9] into the 'toobusy' host aggregate. These are slower, have spinning disks, and are due for replacement. | |||
=== 2020-05-12 === | |||
* 20:33 andrewbogott: moving cloudvirt1023 to the 'standard' pool and out of the 'spare' pool | |||
* 19:10 jeh: disable neutron-openvswitch-agent service on cloudvirt2001-dev.codfw [[phab:T248881|T248881]] | |||
* 19:09 jeh: Shutdown the unused eno2 network interface on cloudvirt2001-dev.codfw to clear up monitoring errors [[phab:T248425|T248425]] | |||
* 18:20 andrewbogott: moving cloudvirt1024 out of the 'maintenance' aggregate and into 'spare' | |||
* 16:45 andrewbogott: restarting neutron-l3-agent on cloudnet1004 so it knows about all three cloudcontrols. Leaving cloudnet1003 since restarting it there will cause network interruptions | |||
* 14:06 arturo: icinga downtime everything for 2h for Debian Buster migration in some cloud components | |||
=== 2020-05-09 === | |||
* 16:53 andrewbogott: rebuilding cloudcontrol2001-dev and 2003-dev with buster for [[phab:T252121|T252121]] | |||
=== 2020-05-08 === | |||
* 19:02 bstorm_: moving tools-k8s-haproxy-2 from cloudvirt1021 to cloudvirt1017 to improve spread | |||
=== 2020-05-05 === | |||
* 13:58 andrewbogott: rebuilding cloudcontrol2004-dev to test new puppet changes | |||
=== 2020-05-04 === | |||
* 09:04 arturo: [codfw1dev] manually modify iptables ruleset to only allow SSH from WMF bastions on cloudservices2003-dev and cloudcontrol2004-dev ([[phab:T251604|T251604]]) | |||
=== 2020-04-21 === | |||
* 22:12 andrewbogott: moving cloudvirt1004 out of the 'standard' aggregate and into the 'maintenance' aggregate | |||
* 16:01 jeh: restart cloudceph mon and osd services for openssl upgrades | |||
=== 2020-04-15 === | |||
* 18:44 jeh: create indexes and views for grwikimedia [[phab:T245912|T245912]] | |||
=== 2020-04-13 === | |||
* 15:07 jeh: restart memcached on labwebs to increase cache size [[phab:T145703|T145703]] | |||
=== 2020-04-09 === | |||
* 19:57 andrewbogott: upgrading eqiad1 designate to rocky | |||
* 16:52 andrewbogott: cleaned up a bunch of leaked .eqiad.wmflabs dns records | |||
=== 2020-04-08 === | |||
* 19:20 andrewbogott: rotated password and api token for pdns servers on cloudservices1003 and cloudservices1004 | |||
* 14:54 arturo: `root@cloudcontrol1003:~# cp /etc/inputrc .inputrc` to solve some bash shortcut weirdness | |||
=== 2020-04-07 === | |||
* 20:57 andrewbogott: service sssd stop; rm -rf /var/lib/sss/db*; service sssd start on tools-sgebastion-08 | |||
=== 2020-04-06 === | |||
* 22:39 andrewbogott: deleting bogus groups cn=b'project-bastion',ou=groups,dc=wikimedia,dc=org and cn=b'project-tools',ou=groups,dc=wikimedia,dc=org from ldap | |||
* 17:42 arturo: [codfw1dev] transferred DNS zone 57.15.185.in-addr.arpa. to the cloudinfra-codfw1dev project ([[phab:T247972|T247972]]) | |||
* 17:39 arturo: [codfw1dev] `openstack zone create --email root@wmflabs.org --type PRIMARY --ttl 3600 --description "floating IPs subnet" 57.15.185.in-addr.arpa.` ([[phab:T247972|T247972]]) | |||
* 16:23 arturo: restarting apache2 in cloudcontrol1003/1004 to pick up latest wmfkeystonehooks changes [[phab:T249494|T249494]] | |||
=== 2020-04-02 === | |||
* 20:59 jeh: codfw1dev clear VM error states and start bastions, puppet master and database | |||
=== 2020-04-01 === | |||
* 16:27 arturo: [codfw1dev] enable puppet across the fleet clean vxlan changes ([[phab:T248881|T248881]]) | |||
=== 2020-03-31 === | |||
* 12:35 arturo: [codfw1dev] restarting VMs: designaterockytest14, bastion-codfw1dev-0[1,2] ([[phab:T248881|T248881]]) | |||
* 12:34 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudvirt2001-dev ([[phab:T248881|T248881]]) | |||
* 12:25 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudnet200[2,3]-dev ([[phab:T248881|T248881]]) | |||
* 11:45 arturo: [codfw1dev] rebooting cloudvirt2003-dev to pick up latest kernel update. Otherwise modprobe is confused trying to load modules and openvswitch won't start ([[phab:T248881|T248881]]) | |||
* 10:40 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudvirt2003-dev ([[phab:T248881|T248881]]) | |||
* 10:09 arturo: [codfw1dev] reboot cloudnet2003-dev into linux 4.9 (was using 4.14 from a testing operation in 2020-03-10) | |||
=== 2020-03-30 === | |||
* 23:42 bstorm_: deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards [[phab:T246689|T246689]] | |||
* 16:44 arturo: [codfw1dev] installing package neutron-openvswitch-agent in cloudvirt2002-dev ([[phab:T248881|T248881]]) | |||
* 16:42 andrewbogott: restarting l3 agents on cloudnets in codfw1dev after applying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/584188/ | |||
=== 2020-03-27 === | |||
* 21:28 bd808: Created huggle.wmcloud.org Designate zone and allocated it to the huggle project | |||
* 19:51 jeh: start haproxy on cloudcontrol2003-dev.wikimedia.org | |||
=== 2020-03-26 === | |||
* 15:01 arturo: icinga downtime cloudvirt* cloudcontrol* cloudnet* lab* cloudstore* | |||
* 15:01 andrewbogott: beginning openstack upgrade window for [[phab:T242766|T242766]] | |||
* 12:32 arturo: [codfw1dev] downgraded systemd, libsystemd0, udev and friends to the non-backports versions ([[phab:T247013|T247013]]) | |||
=== 2020-03-25 === | |||
* 19:29 andrewbogott: dumping a bunch of VMs on cloudvirt1015 to see if it still crashes | |||
* 17:56 jeh: add labweb1002 back into the pool - completed horizon testing [[phab:T240852|T240852]] | |||
* 17:09 jeh: depool labweb1002 for horizon testing [[phab:T240852|T240852]] | |||
=== 2020-03-24 === | |||
* 19:41 jeh: switch cloudvirt1016 from maintenance to standard host aggregate [[phab:T243327|T243327]] | |||
* 15:31 andrewbogott: restarting nova-conductor and nova-api on cloudcontrol1003 and cloudcontrol1004 | |||
=== 2020-03-23 === | |||
* 21:41 jeh: restart neutron-l3-agent on cloudnet100[3,4] to pickup policy.yaml changes | |||
* 13:28 jeh: disable puppet on labweb100[1,2] to enable horizon event traces [[phab:T240852|T240852]] | |||
* 10:26 arturo: restarting apache in both labweb1001/labweb1002 upon reports of returning 500s | |||
=== 2020-03-21 === | |||
* 14:23 andrewbogott: restarting apache2 on labweb1001 and 1002 | |||
=== 2020-03-18 === | |||
* 19:17 andrewbogott: deleted a bunch of records from the pdns database on cloudservices1003/1004 which had a record name but the content (where an IP address should be) was NULL, e.g. m.wikidata.beta.wmflabs.org. | |||
* 10:55 arturo: [codfw1dev] deleting BGP agent, undoing changes we did for [[phab:T245606|T245606]] | |||
=== 2020-03-14 === | |||
* 17:40 jeh: restart maintain-dbusers on labstore1004 [[phab:T247654|T247654]] | |||
=== 2020-03-13 === | |||
* 12:39 arturo: [codfw1dev] reintroduce address scopes for another round of testing [[phab:T244851|T244851]] | |||
* 12:17 arturo: [codfw1dev] enabling puppet in cloudnet200x-dev servers after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/579259 ([[phab:T247505|T247505]]) | |||
=== 2020-03-12 === | |||
* 22:29 bstorm_: running puppet across all dumps mounts to make sure active links are shifted to labstore1006 | |||
=== 2020-03-11 === | |||
* 18:38 jeh: set icingia downtime until 2020-03-23 on CODFW cloud[control,net,virt] hosts during openstack upgrades | |||
* 12:50 arturo: [codfw1dev] several tests creating/deleting address scopes ([[phab:T244727|T244727]] [[phab:T247135|T247135]] [[phab:T246887|T246887]] [[phab:T245606|T245606]]) | |||
* 12:46 arturo: [codfw1dev] disable routing_source_ip in l3 agents for testing proposal detailed at https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Network_refresh#Eliminate_routing_source_ip_address ([[phab:T244727|T244727]]) | |||
=== 2020-03-10 === | |||
* 17:02 arturo: [codfw1dev] deleting address scopes, bad interaction with our custom NAT setup [[phab:T247135|T247135]] | |||
* 13:55 arturo: [codfw1dev] rebooting cloudnet2003-dev into linux kernel 4.14 for testing stuff related to [[phab:T247135|T247135]] | |||
=== 2020-03-09 === | |||
* 18:09 arturo: enabling puppet in cloudvirt1006, all services have been restored | |||
* 17:59 arturo: deleted the neutron bridge on cloudvirt1006, for testing stuff related to the queens upgrade | |||
* 17:58 arturo: stopped neutron-linuxbridge-agent and nova-compute in cloudvirt1006 for testing stuff related to the queens upgrade | |||
=== 2020-03-06 === | |||
* 14:54 andrewbogott: draining all instances off of cloudvirt1006 for [[phab:T246908|T246908]] | |||
=== 2020-03-05 === | |||
* 14:24 arturo: [codfw1dev] we just enabled BGP session between cloudnet2xxx-dev and cr1-codfw ([[phab:T245606|T245606]]) | |||
* 13:07 arturo: [codfw1dev] move the extra IP address for BGP in cloudnet200x-dev servers from eno2.2120 to the br-external bridge device ([[phab:T245606|T245606]]) | |||
* 13:06 arturo: [codfw1dev] upgrade neutron-dynamic-routing packages in cloudnet200X-dev and cloudcontrol200X-dev servers to 11.0.0-2~bpo9+1 ([[phab:T245606|T245606]]) | |||
=== 2020-03-04 === | |||
* 22:22 andrewbogott: upgrading designate on cloudservices1003/1004 to Queens | |||
* 22:09 andrewbogott: moving cloudvirt1006 into the maintenance aggregate for [[phab:T246908|T246908]] | |||
* 21:37 bd808: Running wmcs-wikireplica-dns to add service names for ngwikimedia.*.db.svc.eqiad.wmflabs ([[phab:T240772|T240772]]) | |||
* 21:14 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1009 ([[phab:T246056|T246056]]) | |||
* 21:11 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1010 ([[phab:T246056|T246056]]) | |||
* 21:08 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1011 ([[phab:T246056|T246056]]) | |||
* 21:05 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1002 ([[phab:T246056|T246056]]) | |||
=== 2020-03-02 === | |||
* 16:54 arturo: [codfw1dev] deleted python3-os-ken debian package in cloudnet2003-dev which was installed by hand and had depedency issues | |||
=== 2020-02-29 === | |||
* 16:32 bstorm_: downtimed the smart alert on cloudvirt1009 until Monday since apparently predictive failures flap [[phab:T244986|T244986]] | |||
=== 2020-02-26 === | |||
* 22:03 jeh: powering down cloudvirt1014 for hardware maintenance | |||
=== 2020-02-25 === | |||
* 16:08 andrewbogott: changing neutron's rabbitmq password because oslo is having trouble parsing some of the characters in the password | |||
* 15:26 andrewbogott: updated the cell_mapping record in the nova_api database to add the second rabbitmq server to the transport_url field | |||
* 15:26 andrewbogott: updated the cell_mapping record in the nova_api database to set the db uri to 'mysql+pymysql' -- this in response to a deprecation notice | |||
=== 2020-02-24 === | |||
* 12:16 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr2-codfw` ([[phab:T245606|T245606]]) | |||
* 12:16 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr1-codfw` ([[phab:T245606|T245606]]) | |||
* 12:09 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.187 --remote-as 65002 cr2-codfw` ([[phab:T245606|T245606]]) | |||
* 12:09 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.186 --remote-as 65002 cr1-codfw` ([[phab:T245606|T245606]]) | |||
* 12:06 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-delete 17b8c2a3-f0ce-4d50-a265-18ccac703c61` ([[phab:T245606|T245606]]) | |||
* 10:59 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker bgppeer` ([[phab:T245606|T245606]]) | |||
* 10:56 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.185 --remote-as 65002 bgppeer` ([[phab:T245606|T245606]]) | |||
=== 2020-02-21 === | |||
* 12:48 arturo: [codfw1dev] running `root@cloudcontrol2001-dev:~# neutron bgp-speaker-network-add bgpspeaker wan-transport-codfw` ([[phab:T245606|T245606]]) | |||
* 12:46 arturo: [codfw1dev] created bgpspeaker for AS64711 ([[phab:T245606|T245606]]) | |||
* 12:42 arturo: [codfw1dev] run `sudo neutron-db-manage upgrade head` to upgrade the db schema for neutron bgp tables | |||
* 11:51 arturo: [codfw1dev] create a neutron subnet pool per each subnet objects we have and manually update DB to inter-associate them ([[phab:T245606|T245606]]) | |||
* 11:49 arturo: [codfw1dev] rename neutron address scope `no-nat` to `bgp` ([[phab:T245606|T245606]]) | |||
* 11:37 arturo: [codfw1dev] cleanup unused neutron subnet pools from previous address scope tests ([[phab:T244851|T244851]]) | |||
=== 2020-02-20 === | |||
* 19:22 andrewbogott: updating designate pool config for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/572213/ | |||
* 15:33 andrewbogott: migrating all VMs on cloudvirt1014 to cloudvirt1022 | |||
* 13:35 arturo: [codfw1dev] disable puppet in cloudcontrol servers to hack neutron.conf for tests related to [[phab:T245606|T245606]] | |||
* 13:33 arturo: [codfw1dev] disable puppet in cloudnet servers to hack neutron.conf for tests related to [[phab:T245606|T245606]] | |||
=== 2020-02-18 === | |||
* 22:19 andrewbogott: transferred the tools.wmcloud.org. to the tools project | |||
* 22:16 andrewbogott: moved wmcloud.org dns domain to the cloud-infra project | |||
* 21:02 andrewbogott: adding .eqiad1.wikimedia.cloud records to all existing eqiad1 VMs, updating all eqiad1 internal pointer records to reference the new eqiad1.wikimedia.cloud fqdns. | |||
* 09:44 arturo: deleted DNS zone wmcloud.org and try re-creating it | |||
=== 2020-02-14 === | |||
* 10:35 arturo: running `root@cloudcontrol2001-dev:~# designate server-create --name ns1.openstack.codfw1dev.wikimediacloud.org.` ([[phab:T243766|T243766]]) | |||
* 10:32 arturo: running `root@cloudcontrol1004:~# designate server-create --name ns1.openstack.eqiad1.wikimediacloud.org.` ([[phab:T243766|T243766]]) | |||
* 10:32 arturo: running `root@cloudcontrol1004:~# designate server-create --name ns0.openstack.eqiad1.wikimediacloud.org.` ([[phab:T243766|T243766]]) | |||
=== 2020-02-12 === | |||
* 13:38 arturo: [codfw1dev] add reference to subnetpool to the instance subnet `MariaDB [neutron]> update subnets set subnetpool_id='d129650d-d4be-4fe1-b13e-6edb5565cb4a' where id = '7adfcebe-b3d0-4315-92fe-e8365cc80668';` ([[phab:T244851|T244851]]) | |||
=== 2020-02-11 === | |||
* 13:46 arturo: [codfw1dev] creating some neutron objects to investigate [[phab:T244851|T244851]] (subnets, subnet pools, address scopes, ...) | |||
* 12:40 arturo: [codfw1dev] delete unknown address scope 'wmcs-v4-scope': `root@cloudcontrol2001-dev:~# openstack address scope delete 078cfd71-117b-4aac-9197-6ebbbb7dd3de` ([[phab:T244851|T244851]]) | |||
* 12:40 arturo: [codfw1dev] delete unknown subnet pool 'cloudinstancesb-v4-pool0': `root@cloudcontrol2001-dev:~# openstack subnet pool delete d23a9b88-5c3d-4a53-ab88-053233a75365` ([[phab:T244851|T244851]]) | |||
=== 2020-02-07 === | |||
* 18:11 jeh: shutdown cloudvirt1016 for hardware maintenance [[phab:T241882|T241882]] | |||
=== 2020-02-06 === | |||
* 14:44 jeh: update apt packages on cloudvirt1015 [[phab:T220853|T220853]] | |||
* 14:28 jeh: run hardware tests on cloudvirt1015 [[phab:T220853|T220853]] | |||
=== 2020-01-28 === | |||
* 17:24 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# designate server-create --name ns0.openstack.codfw1dev.wikimediacloud.org. ([[phab:T243766|T243766]]) | |||
* 10:18 arturo: [codfw1dev] created DNS record `bastion-codfw1dev-01.codfw1dev.wmcloud.org A 185.15.57.2` ([[phab:T242976|T242976]], [[phab:T229441|T229441]]) | |||
* 10:13 arturo: [codfw1dev] the zone `codfw1dev.wmcloud.org` belongs now to the `cloudinfra-codfw1dev` project ([[phab:T242976|T242976]]) | |||
* 10:11 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for public addresses" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wmcloud.org.` ([[phab:T242976|T242976]] and [[phab:T243766|T243766]]) | |||
* 09:53 arturo: restart apache2 in labweb1001/1002 because horizon errors | |||
* 09:47 arturo: created DNS zone wmcloud.org in eqiad1, transfer it to the cloudinfra project ([[phab:T242976|T242976]]) right now only use is to delegate codfw1dev.wmcloud.org subdomain to designate in the other deployment | |||
=== 2020-01-27 === | |||
* 12:45 arturo: [codfw1dev] manually move the new domain to the `cloudinfra-codfw1dev` project clouddb2001-dev: `[designate]> update zones set tenant_id='cloudinfra-codfw1dev' where id = '4c75410017904858a5839de93c9e8b3d';` [[phab:T243556|T243556]] | |||
* 12:44 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for VMs" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wikimedia.cloud.` [[phab:T243556|T243556]] | |||
=== 2020-01-24 === | |||
* 15:10 jeh: remove icinga downtime for cloudvirt1013 [[phab:T241313|T241313]] | |||
* 12:52 arturo: repooling cloudvirt1013 after HW got fixed ([[phab:T241313|T241313]]) | |||
=== 2020-01-21 === | |||
* 17:43 bstorm_: remounting /mnt/nfs/dumps-labstore1007.wikimedia.org/ on all dumps-mounting projects | |||
* 10:24 arturo: running `sudo systemctl restart apache2.service` in both labweb servers to try mitigating [[phab:T240852|T240852]] | |||
=== 2020-01-15 === | |||
* 16:59 bd808: Changed the config for cloud-announce mailing list so that lsit admins do not get bounce unsubscribe notices | |||
=== 2020-01-14 === | |||
* 14:03 arturo: icinga downtime all cloudvirts for another 2h for fixing some icinga checks | |||
* 12:04 arturo: icinga downtime toolchecker for 2 hours for openstack upgrades [[phab:T241347|T241347]] | |||
* 12:02 arturo: icinga downtime cloud* labs* hosts for 2 hours for openstack upgrades [[phab:T241347|T241347]] | |||
* 04:26 andrewbogott: upgrading designate on cloudservices1003/1004 | |||
=== 2020-01-13 === | |||
* 13:34 arturo: [¢odfw1dev] prevent neutron from allocating floating IPs from the wrong subnet by doing `neutron subnet-update --allocation-pool start=208.80.153.190,end=208.80.153.190 cloud-instances-transport1-b-codfw` ([[phab:T242594|T242594]]) | |||
=== 2020-01-10 === | |||
* 13:27 arturo: cloudvirt1009: virsh undefine i-000069b6. This is tools-elastic-01 which is running on cloudvirt1008 (so, leaked on cloudvirt1009) | |||
=== 2020-01-09 === | |||
* 11:12 arturo: running `MariaDB [nova_eqiad1]> update quota_usages set in_use='0' where project_id='etytree';` ([[phab:T242332|T242332]]) | |||
* 11:11 arturo: running `MariaDB [nova_eqiad1]> select * from quota_usages where project_id = 'etytree';` ([[phab:T242332|T242332]]) | |||
* 10:32 arturo: ran `root@cloudcontrol1004:~# nova-manage project quota_usage_refresh --project etytree` | |||
=== 2020-01-08 === | |||
* 10:53 arturo: icinga downtime all cloudvirts for 30 minutes to re-create all canary VMs" | |||
=== 2020-01-07 === | |||
* 11:12 arturo: icinga-downtime everything cloud* for 30 minutes to merge nova scheduler changes | |||
* 10:02 arturo: icinga downtime cloudvirt1009 for 30 minutes to re-create canary VM ([[phab:T242078|T242078]]) | |||
=== 2020-01-06 === | |||
* 13:45 andrewbogott: restarting nova-api and nova-conductor on cloudcontrol1003 and 1004 | |||
=== 2020-01-04 === | |||
* 16:34 arturo: icinga downtime cloudvirt1024 for 2 months because hardware errors ([[phab:T241884|T241884]]) | |||
=== 2019-12-31 === | |||
* 11:46 andrewbogott: I couldn't! | |||
* 11:40 andrewbogott: restarting cloudservices2002-dev to see if I can reproduce an issue I saw earlier | |||
=== 2019-12-25 === | |||
* 10:13 arturo: icinga downtime for 30 minutes the whole cloud* lab* fleet to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/560575 (will restart some openstack components) | |||
=== 2019-12-24 === | |||
* 15:13 arturo: icinga downtime all the lab* fleet for nova password change for 1h | |||
* 14:39 arturo: icinga downtime all the cloud* fleet for nova password change for 1h | |||
=== 2019-12-23 === | |||
* 11:13 arturo: enable puppet in cloudcontrol1003/1004 | |||
* 10:40 arturo: disable puppet in cloudcontrol1003/1004 while doing changes related to python-ldap | |||
=== 2019-12-22 === | |||
* 23:48 andrewbogott: restarting nova-conductor and nova-api on cloudcontrol1003 and 1004 | |||
* 09:45 arturo: cloudvirt1013 is back (did it alone) [[phab:T241313|T241313]] | |||
* 09:37 arturo: cloudvirt1013 is down for good. Apparently powered off. I can't even reach it via iLO | |||
=== 2019-12-20 === | |||
* 12:43 arturo: icinga downtime cloudmetrics1001 for 128 hours | |||
=== 2019-12-18 === | |||
* 12:55 arturo: [codfw1dev] created a new subnet neutron object to hold the new CIDR for floating IPs (cloud-codfw1dev-floating - 185.15.57.0/29) [[phab:T239347|T239347]] | |||
=== 2019-12-17 === | |||
* 07:21 andrewbogott: deploying horizon/train to labweb1001/1002 | |||
=== 2019-12-12 === | |||
* 06:11 arturo: schedule 4h downtime for labstores | |||
* 05:57 arturo: schedule 4h downtime for cloudvirts and other openstack components due to upgrade ops | |||
=== 2019-12-02 === | |||
* 06:28 andrewbogott: running nova-manage db sync on eqiad1 | |||
* 06:27 andrewbogott: running nova-manage cell_v2 map_cell0 on eqiad1 | |||
=== 2019-11-21 === | |||
* 16:07 jeh: created replica indexes and views for szywiki [[phab:T237373|T237373]] | |||
* 15:48 jeh: creating replica indexes and views for shywiktionary [[phab:T238115|T238115]] | |||
* 15:48 jeh: creating replica indexes and views for gcrwiki [[phab:T238114|T238114]] | |||
* 15:46 jeh: creating replica indexes and views for minwiktionary [[phab:T238522|T238522]] | |||
* 15:36 jeh: creating replica indexes and views for gewikimedia [[phab:T236404|T236404]] | |||
=== 2019-11-18 === | |||
* 19:27 andrewbogott: repooling labsdb1011 | |||
* 18:54 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1011 [[phab:T238480|T238480]] | |||
* 18:44 andrewbogott: depooling labsdb1011 and killing remaining user queries [[phab:T238480|T238480]] | |||
* 18:42 andrewbogott: repooled labsdb1009 and 1010 [[phab:T238480|T238480]] | |||
* 18:19 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1010 [[phab:T238480|T238480]] | |||
* 18:18 andrewbogott: depooling labsdb1010, killing remaining user queries | |||
* 17:46 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1009 [[phab:T238480|T238480]] | |||
* 17:38 andrewbogott: depooling labsdb1009, killing remaining user queries | |||
* 16:54 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1012 [[phab:T237509|T237509]] | |||
=== 2019-11-15 === | === 2019-11-15 === | ||
* 20:04 andrewbogott: repool labdb1011 ([[phab:T237509|T237509]]) | * 20:04 andrewbogott: repool labdb1011 ([[phab:T237509|T237509]]) |
Revision as of 15:21, 19 May 2022
2022-05-19
- 15:21 andrewbogott: resetting password for the 'troveguest' rabbitmq user. I think I may have broken this during a recent rebuild of the rabbitmq cluster
2022-05-18
- 15:42 andrewbogott: updated the 'debian-11.0-bullseye' glance image with a fresh build
2022-05-14
- 11:33 taavi: deleted projects 'ores' and 'ores-staging' T308102
2022-05-13
- 06:20 wm-bot2: Safe reboot of 'cloudvirt1045.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 06:20 wm-bot2: Unset cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 06:16 wm-bot2: Drained 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:16 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 06:15 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:15 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:11 wm-bot2: Safe reboot of 'cloudvirt1044.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 06:11 wm-bot2: Unset cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 06:10 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 06:10 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:09 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:07 wm-bot2: Drained 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:06 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 06:06 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 06:05 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:51 wm-bot2: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:50 wm-bot2: Draining 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:50 wm-bot2: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:49 wm-bot2: Set cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:49 wm-bot2: Safe reboot of 'cloudvirt1043.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 05:49 wm-bot2: Unset cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:49 wm-bot2: Draining 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:49 wm-bot2: Safe rebooting 'cloudvirt1044.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:47 wm-bot2: Safe reboot of 'cloudvirt1042.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 05:47 wm-bot2: Unset cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:45 wm-bot2: Drained 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:45 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:45 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:44 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:44 wm-bot2: Drained 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:42 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:42 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:42 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:41 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:40 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:40 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:38 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:37 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:37 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:30 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:29 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:29 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:19 wm-bot2: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:18 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:18 wm-bot2: Draining 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:18 wm-bot2: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:18 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:18 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:12 wm-bot2: Safe reboot of 'cloudvirt1040.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 05:12 wm-bot2: Unset cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:08 wm-bot2: Drained 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:02 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:02 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 05:02 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:02 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:02 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 05:01 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:52 wm-bot2: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:51 wm-bot2: Draining 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:51 wm-bot2: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:48 wm-bot2: Safe reboot of 'cloudvirt1041.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 04:48 wm-bot2: Unset cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:44 wm-bot2: Drained 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:31 wm-bot2: Set cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:30 wm-bot2: Draining 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:30 wm-bot2: Safe rebooting 'cloudvirt1041.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:30 wm-bot2: Safe reboot of 'cloudvirt1039.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 04:30 wm-bot2: Unset cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:27 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:26 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:26 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:26 wm-bot2: Drained 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:26 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:25 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:25 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:24 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:23 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:23 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:23 wm-bot2: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:22 wm-bot2: Draining 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:22 wm-bot2: Safe rebooting 'cloudvirt1040.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:21 wm-bot2: Safe reboot of 'cloudvirt1038.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 04:21 wm-bot2: Unset cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:18 wm-bot2: Drained 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:16 wm-bot2: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:16 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:16 wm-bot2: Draining 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:16 wm-bot2: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:15 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:15 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:37 wm-bot2: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:36 wm-bot2: Draining 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:36 wm-bot2: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:34 wm-bot2: Safe reboot of 'cloudvirt1037.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 03:34 wm-bot2: Unset cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:27 wm-bot2: Drained 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:27 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:26 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:26 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:26 wm-bot2: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:25 wm-bot2: Draining 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:25 wm-bot2: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:22 wm-bot2: Unset cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:55 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:55 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:54 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:54 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:54 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:54 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:05 wm-bot2: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:05 wm-bot2: Draining 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:05 wm-bot2: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:04 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:04 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:03 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:23 wm-bot2: Safe reboot of 'cloudvirt1035.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 01:23 wm-bot2: Unset cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 01:19 wm-bot2: Drained 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:01 wm-bot2: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 01:01 wm-bot2: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 01:00 wm-bot2: Draining 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:00 wm-bot2: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:00 wm-bot2: Draining 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:00 wm-bot2: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. - cookbook ran by andrew@buster
- 00:25 wm-bot2: Safe reboot of 'cloudvirt1033.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 00:25 wm-bot2: Unset cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:21 wm-bot2: Drained 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster
- 00:20 wm-bot2: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:19 wm-bot2: Draining 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster
- 00:19 wm-bot2: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. - cookbook ran by andrew@buster
- 00:11 wm-bot2: Safe reboot of 'cloudvirt1034.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 00:11 wm-bot2: Unset cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:07 wm-bot2: Drained 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster
2022-05-12
- 23:55 wm-bot2: Set cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 23:55 wm-bot2: Set cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 23:54 wm-bot2: Draining 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster
- 23:54 wm-bot2: Safe rebooting 'cloudvirt1034.eqiad.wmnet'. - cookbook ran by andrew@buster
- 23:54 wm-bot2: Draining 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster
- 23:54 wm-bot2: Safe rebooting 'cloudvirt1033.eqiad.wmnet'. - cookbook ran by andrew@buster
- 22:23 wm-bot2: Safe reboot of 'cloudvirt1031.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 22:23 wm-bot2: Unset cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 22:20 wm-bot2: Drained 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster
- 22:17 wm-bot2: Safe reboot of 'cloudvirt1032.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 22:17 wm-bot2: Unset cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 22:13 wm-bot2: Drained 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:57 wm-bot2: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:56 wm-bot2: Draining 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:56 wm-bot2: Safe rebooting 'cloudvirt1032.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:55 wm-bot2: Draining 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:55 wm-bot2: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:54 wm-bot2: Safe reboot of 'cloudvirt1030.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:54 wm-bot2: Unset cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:53 wm-bot2: Set cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:52 wm-bot2: Draining 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:52 wm-bot2: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:51 wm-bot2: Drained 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:44 wm-bot2: Safe reboot of 'cloudvirt1029.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:44 wm-bot2: Unset cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:42 wm-bot2: Drained 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:36 wm-bot2: Safe reboot of 'cloudvirt-wdqs1001.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:36 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1001.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:33 wm-bot2: Drained 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:33 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1001.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:32 wm-bot2: Draining 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:32 wm-bot2: Safe rebooting 'cloudvirt-wdqs1001.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:32 wm-bot2: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:31 wm-bot2: Safe reboot of 'cloudvirt-wdqs1002.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:31 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1002.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:31 wm-bot2: Draining 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:31 wm-bot2: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:30 wm-bot2: Set cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:29 wm-bot2: Draining 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:29 wm-bot2: Safe rebooting 'cloudvirt1030.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:29 wm-bot2: Drained 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:28 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1002.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:28 wm-bot2: Draining 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:28 wm-bot2: Safe rebooting 'cloudvirt-wdqs1002.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:22 wm-bot2: Safe reboot of 'cloudvirt1026.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:22 wm-bot2: Unset cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:21 wm-bot2: Safe reboot of 'cloudvirt-wdqs1003.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:21 wm-bot2: Unset cloudvirt 'cloudvirt-wdqs1003.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:18 wm-bot2: Drained 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:18 wm-bot2: Drained 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:18 wm-bot2: Set cloudvirt 'cloudvirt-wdqs1003.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:17 wm-bot2: Draining 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:17 wm-bot2: Safe rebooting 'cloudvirt-wdqs1003.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:17 wm-bot2: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:16 wm-bot2: Draining 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:16 wm-bot2: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:14 wm-bot2: Safe reboot of 'cloudvirt1025.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:14 wm-bot2: Unset cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:11 wm-bot2: Safe reboot of 'cloudvirt1046.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 21:11 wm-bot2: Unset cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:10 wm-bot2: Drained 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:08 wm-bot2: Drained 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:08 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:07 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:07 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:05 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:04 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:04 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:00 wm-bot2: Set cloudvirt 'cloudvirt1046.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:59 wm-bot2: Draining 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:59 wm-bot2: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:59 wm-bot2: Safe reboot of 'cloudvirt1047.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 20:59 wm-bot2: Unset cloudvirt 'cloudvirt1047.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:57 wm-bot2: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:57 wm-bot2: Draining 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:57 wm-bot2: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:55 wm-bot2: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:55 wm-bot2: Drained 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:54 wm-bot2: Draining 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:54 wm-bot2: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:54 wm-bot2: Safe reboot of 'cloudvirt1024.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 20:54 wm-bot2: Unset cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:53 wm-bot2: Set cloudvirt 'cloudvirt1047.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:52 wm-bot2: Draining 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:52 wm-bot2: Safe rebooting 'cloudvirt1047.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:50 wm-bot2: Drained 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:49 wm-bot2: Set cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:49 wm-bot2: Draining 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:49 wm-bot2: Safe rebooting 'cloudvirt1025.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:48 wm-bot2: Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 20:48 wm-bot2: Unset cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:44 wm-bot2: Drained 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:44 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:43 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:43 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Draining 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:34 wm-bot2: Safe rebooting 'cloudvirt1024.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:31 wm-bot2: Safe reboot of 'cloudvirt1027.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 20:31 wm-bot2: Unset cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:28 wm-bot2: Drained 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:28 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:27 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:27 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:23 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:22 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:22 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:11 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:10 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:10 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:07 wm-bot2: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:07 wm-bot2: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:06 wm-bot2: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:06 wm-bot2: Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 20:05 wm-bot2: Unset cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:02 wm-bot2: Drained 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:01 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:00 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:00 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:58 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:57 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:57 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:36 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:35 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:35 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 15:06 andrewbogott: stopping nfs-server on labstore1004 in preparation for reboot
- 04:12 andrewbogott: rebooting primary bastion (bastion-eqiad1-03.bastion.eqiad1.wikimedia.cloud) in hopes of resolving a problem with ssh proxying
2022-05-11
- 18:48 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 18:48 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 18:48 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 18:39 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 18:38 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 18:38 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 18:04 wm-bot2: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 18:03 wm-bot2: Draining 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 18:03 wm-bot2: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. - cookbook ran by andrew@buster
- 08:56 wm-bot2: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 08:52 wm-bot2: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 07:53 dcaro: test
- 04:28 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 04:27 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 04:27 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:44 wm-bot2: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:43 wm-bot2: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:43 wm-bot2: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:42 wm-bot2: Safe reboot of 'cloudvirt1021.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 03:42 wm-bot2: Unset cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:39 wm-bot2: Drained 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:23 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:22 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:22 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:09 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:08 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:08 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:04 andrewbogott: reset and recreated the rabbitmq cluster in eqiad1 to get around some broken queues.
- 03:02 wm-bot2: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 03:01 wm-bot2: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 03:01 wm-bot2: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:28 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 02:25 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 02:25 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
2022-05-10
- 21:43 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:40 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:40 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:35 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 21:32 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 21:32 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:05 wm-bot: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 20:02 wm-bot: Draining 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:01 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:00 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 20:00 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:57 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:57 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:55 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:55 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:47 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:46 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:46 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:45 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:45 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:44 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:44 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:40 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:39 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:39 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:37 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:36 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:36 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:33 wm-bot: Safe reboot of 'cloudvirt1017.eqiad.wmnet' finished successfully. - cookbook ran by andrew@buster
- 19:33 wm-bot: Unset cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:29 wm-bot: Drained 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:06 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 19:05 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster
- 19:05 wm-bot: Safe rebooting 'cloudvirt1017.eqiad.wmnet'. - cookbook ran by andrew@buster
- 15:41 andrewbogott: rebooting cloud*-dev for T307668
- 13:59 taavi: manually attached User:Dreamy Jazz to wikitech for a password reset (https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Manually_associate_an_LDAP_account_with_wikitech)
2022-05-07
- 01:33 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:32 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 01:30 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster
- 01:21 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 01:18 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. - cookbook ran by andrew@buster
2022-05-03
- 20:38 andrewbogott: upgrading clouddb2001-dev in place
- 18:18 taavi: updated 'puppet-enc' endpoints on the keystone catalog to use https and port 443
2022-05-02
- 16:56 dcaro: rebooting cloudmetrics1001
2022-04-29
- 14:22 andrewbogott: changing login.toolforge.org, bastion.toolforge.org, and dev.toolforge.org dns entries to refer to the new Buster bastions T277653 https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timeline
2022-04-27
- 14:51 wm-bot: Finished rebooting the nodes ['cloudcephosd1001', 'cloudcephosd1002', 'cloudcephosd1003', 'cloudcephosd1004', 'cloudcephosd1005', 'cloudcephosd1006', 'cloudcephosd1007', 'cloudcephosd1008', 'cloudcephosd1009', 'cloudcephosd1010', 'cloudcephosd1011', 'cloudcephosd1012', 'cloudcephosd1013', 'cloudcephosd1014', 'cloudcephosd1015', 'cloudcephosd1016', 'cloudcephosd1017', 'cloudcephosd1018', 'cloudcephosd1019', 'cloudcephosd1020', 'cloud
- 14:50 wm-bot: Finished rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:46 wm-bot: Rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:46 wm-bot: Finished rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:41 wm-bot: Rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:41 wm-bot: Finished rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:35 wm-bot: Rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:35 wm-bot: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:31 wm-bot: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:31 wm-bot: Finished rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:27 wm-bot: Rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:27 wm-bot: Finished rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:23 wm-bot: Rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:23 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:13 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:13 wm-bot: Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:09 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:09 wm-bot: Finished rebooting node cloudcephosd1016.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:05 wm-bot: Rebooting node cloudcephosd1016.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:05 wm-bot: Finished rebooting node cloudcephosd1015.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:01 wm-bot: Rebooting node cloudcephosd1015.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:01 wm-bot: Finished rebooting node cloudcephosd1014.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:57 wm-bot: Rebooting node cloudcephosd1014.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:57 wm-bot: Finished rebooting node cloudcephosd1013.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:44 wm-bot: Rebooting node cloudcephosd1013.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:43 wm-bot: Finished rebooting node cloudcephosd1012.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:39 wm-bot: Rebooting node cloudcephosd1012.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:39 wm-bot: Finished rebooting node cloudcephosd1011.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:35 wm-bot: Rebooting node cloudcephosd1011.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:35 wm-bot: Finished rebooting node cloudcephosd1010.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:31 wm-bot: Rebooting node cloudcephosd1010.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:31 wm-bot: Finished rebooting node cloudcephosd1009.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:26 wm-bot: Rebooting node cloudcephosd1009.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:26 wm-bot: Finished rebooting node cloudcephosd1008.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:14 wm-bot: Rebooting node cloudcephosd1008.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:14 wm-bot: Finished rebooting node cloudcephosd1007.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:10 wm-bot: Rebooting node cloudcephosd1007.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:10 wm-bot: Finished rebooting node cloudcephosd1006.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:05 wm-bot: Rebooting node cloudcephosd1006.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:05 wm-bot: Finished rebooting node cloudcephosd1005.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:01 wm-bot: Rebooting node cloudcephosd1005.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:01 wm-bot: Finished rebooting node cloudcephosd1004.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:57 wm-bot: Rebooting node cloudcephosd1004.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:57 wm-bot: Finished rebooting node cloudcephosd1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:53 wm-bot: Rebooting node cloudcephosd1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:53 wm-bot: Finished rebooting node cloudcephosd1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:50 wm-bot: Rebooting node cloudcephosd1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:50 wm-bot: Finished rebooting node cloudcephosd1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:46 wm-bot: Rebooting node cloudcephosd1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:46 wm-bot: Rebooting the nodes cloudcephosd1001,cloudcephosd1002,cloudcephosd1003,cloudcephosd1004,cloudcephosd1005,cloudcephosd1006,cloudcephosd1007,cloudcephosd1008,cloudcephosd1009,cloudcephosd1010,cloudcephosd1011,cloudcephosd1012,cloudcephosd1013,cloudcephosd1014,cloudcephosd1015,cloudcephosd1016,cloudcephosd1017,cloudcephosd1018,cloudcephosd1019,cloudcephosd1020,cloudcephosd1021,cloudcephosd1022,cloudcephosd1023,cloudcephosd1024 - cookbo
- 12:15 wm-bot: Finished rebooting the nodes ['cloudcephmon1001', 'cloudcephmon1002', 'cloudcephmon1003'] - cookbook ran by dcaro@vulcanus
- 12:15 wm-bot: Finished rebooting node cloudcephmon1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:12 wm-bot: Rebooting node cloudcephmon1003.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:12 wm-bot: Finished rebooting node cloudcephmon1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:09 wm-bot: Rebooting node cloudcephmon1002.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:09 wm-bot: Finished rebooting node cloudcephmon1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:07 wm-bot: Rebooting node cloudcephmon1001.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 12:07 wm-bot: Rebooting the nodes cloudcephmon1001,cloudcephmon1002,cloudcephmon1003 - cookbook ran by dcaro@vulcanus
- 12:05 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] - cookbook ran by dcaro@vulcanus
- 12:05 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 12:02 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 12:02 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:59 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:59 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:56 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:56 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus
- 11:55 wm-bot: Finished rebooting the nodes ['cloudcephmon2004-dev', 'cloudcephmon2005-dev', 'cloudcephmon2006-dev'] - cookbook ran by dcaro@vulcanus
- 11:55 wm-bot: Finished rebooting node cloudcephmon2006-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:52 wm-bot: Rebooting node cloudcephmon2006-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:52 wm-bot: Finished rebooting node cloudcephmon2005-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:47 wm-bot: Rebooting node cloudcephmon2005-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:47 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:43 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 11:43 wm-bot: Rebooting the nodes cloudcephmon2004-dev,cloudcephmon2005-dev,cloudcephmon2006-dev - cookbook ran by dcaro@vulcanus
2022-04-26
- 10:36 taavi: [codfw1dev] updated designate pool to 2004/2005-dev according to the instructions on https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS/Designate#Initial_designate/pdns_node_setup
2022-04-22
- 10:33 taavi: [codfw1dev] restart designate-sink on both new cloudservices host to fix rabbitmq connectivity
2022-04-21
- 05:38 andrewbogott: replaced cloudservices200[2,3] with cloudservices200[4,5]
2022-04-19
- 15:29 andrewbogott: stopping all VMs on cloudvirt1019, reimaging host
2022-04-18
- 15:23 andrewbogott: reimaging cloudvirt1020, leaving VMs in place
- 13:40 andrewbogott: shutting down many codfdfw1dev servers (including network infra!) for T305469
2022-04-14
- 20:14 andrewbogott: restarting nova-api and nova-conductor services in a superstitious attempt to reduce open DB connections
2022-04-13
- 22:01 andrewbogott: restarting galera on cloudcontrols (one by one) to clear open connections
2022-04-11
- 15:59 taavi: created cloudinfra.wmcloud.org zone
2022-04-09
- 19:55 andrewbogott: reimaging cloudbackup1001-dev to bullseye
- 19:37 taavi: add 'puppet-enc' service & endpoint to keystone T274666
- 19:25 andrewbogott: reimaging cloudbackup1002-dev to bullseye
2022-04-07
- 12:51 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. (T305631) - cookbook ran by arturo@nostromo
2022-04-06
- 09:12 arturo: [codf1dev] installing python3-eventlet 0.30.2-5~bpo11+1 on all required servers (cloudvirt, cloudnet, cloudcontrol) (T305157)
- 08:45 arturo: [codfw1dev] trying with python3-eventlet 0.30.2-5 installed by hand on cloudvirt2003-dev (T305157)
- 08:42 arturo: [codfw1dev] trying with python3-eventlet 0.30.2-5 installed by hand on cloudcontrol servers (T305157)
- 08:24 arturo: [codfw1dev] trying with python3-dnspython 2.2.0-2 installed by hand on cloudvirt2003-dev (T305157)
- 08:20 arturo: [codfw1dev] trying with python3-dnspython 2.2.0-2 installed by hand on cloudcontrol servers (T305157)
2022-03-30
- 11:20 arturo: apply urpf strict filter to eqiad cloud-hosts vlan - T285461
2022-03-29
- 10:02 dcaro: restarting keystone (T304918)
2022-03-23
- 22:53 wm-bot: Drained 'cloudvirt1045.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 22:38 wm-bot: Drained 'cloudvirt1044.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 22:12 wm-bot: Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 22:12 wm-bot: Draining 'cloudvirt1045.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 22:08 wm-bot: Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 22:07 wm-bot: Set cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 22:06 wm-bot: Draining 'cloudvirt1044.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 22:06 wm-bot: Draining 'cloudvirt1043.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 21:54 wm-bot: Drained 'cloudvirt1042.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 21:19 wm-bot: Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 21:19 wm-bot: Draining 'cloudvirt1042.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 21:12 wm-bot: Drained 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 21:12 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 21:09 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 21:07 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 21:04 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:55 wm-bot: Set cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:54 wm-bot: Draining 'cloudvirt1041.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:30 wm-bot: Drained 'cloudvirt1039.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:15 wm-bot: Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:15 wm-bot: Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:14 wm-bot: Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:14 wm-bot: Draining 'cloudvirt1039.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:44 wm-bot: Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:43 wm-bot: Draining 'cloudvirt1038.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:19 wm-bot: Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:18 wm-bot: Draining 'cloudvirt1037.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:13 wm-bot: Drained 'cloudvirt1036.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:02 wm-bot2: Testing wm-bot relay to #wikimedia-cloud-feed
- 17:55 wm-bot: Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:54 wm-bot: Draining 'cloudvirt1036.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 17:04 wm-bot: Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:03 wm-bot: Draining 'cloudvirt1035.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 17:03 wm-bot: Drained 'cloudvirt1034.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:51 wm-bot: Drained 'cloudvirt1033.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:37 wm-bot: Set cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:37 wm-bot: Set cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:36 wm-bot: Draining 'cloudvirt1034.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:36 wm-bot: Draining 'cloudvirt1033.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 15:01 wm-bot: Drained 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 15:00 wm-bot: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 14:57 wm-bot: Draining 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 14:44 wm-bot: Drained 'cloudvirt1031.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 14:35 wm-bot: Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 14:34 wm-bot: Draining 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 14:32 wm-bot: Drained 'cloudvirt1030.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 14:20 wm-bot: Set cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 14:19 wm-bot: Draining 'cloudvirt1031.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 14:18 wm-bot: Set cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 14:17 wm-bot: Draining 'cloudvirt1030.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 13:54 taavi: restart nova-fullstack on cloudcontrol1003 to pick up bastion ip change
- 13:43 wm-bot: Drained 'cloudvirt1029.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 13:23 wm-bot: Set cloudvirt 'cloudvirt1029.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 13:22 wm-bot: Draining 'cloudvirt1029.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
2022-03-22
- 22:59 wm-bot: Set cloudvirt 'cloudvirt1027.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 22:58 wm-bot: Draining 'cloudvirt1027.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
2022-03-17
- 01:09 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 00:53 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 00:52 wm-bot: Setting cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 00:52 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
2022-03-15
- 20:58 wm-bot: Drained 'cloudvirt1026.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:36 wm-bot: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:36 wm-bot: Setting cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:36 wm-bot: Draining 'cloudvirt1026.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 13:14 wm-bot: Unset cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo
- 13:14 wm-bot: Unsetting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo
- 10:32 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo
- 10:30 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by arturo@nostromo
2022-03-14
- 21:24 wm-bot: Drained 'cloudvirt1025.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:59 wm-bot: Set cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:58 wm-bot: Setting cloudvirt 'cloudvirt1025.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:58 wm-bot: Draining 'cloudvirt1025.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:15 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 20:15 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 20:02 wm-bot: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 19:59 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 19:59 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 19:16 wm-bot: Set cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 19:15 wm-bot: Setting cloudvirt 'cloudvirt1024.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 19:15 wm-bot: Draining 'cloudvirt1024.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 19:13 wm-bot: Drained 'cloudvirt1023.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:56 wm-bot: Set cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:55 wm-bot: Setting cloudvirt 'cloudvirt1023.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:55 wm-bot: Draining 'cloudvirt1023.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:53 wm-bot: Drained 'cloudvirt1022.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:52 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:51 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:51 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:50 wm-bot: Drained 'cloudvirt1021.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:48 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:48 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:48 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 11:48 dcaro: rebased cookbooks on latest master, make sure you pull before sending new patches
2022-03-08
- 18:29 wm-bot: Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:29 wm-bot: Setting cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:29 wm-bot: Draining 'cloudvirt1022.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:23 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:21 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:21 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 18:18 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:17 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 18:17 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 17:28 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:27 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:27 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 17:18 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:15 wm-bot: Setting cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 17:15 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:48 wm-bot: Set cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:47 wm-bot: Setting cloudvirt 'cloudvirt1017.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:47 wm-bot: Draining 'cloudvirt1017.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:36 wm-bot: Drained 'cloudvirt1016.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 16:08 wm-bot: Set cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:07 wm-bot: Setting cloudvirt 'cloudvirt1016.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster
- 16:07 wm-bot: Draining 'cloudvirt1016.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster
- 13:11 arturo: [codfw1dev] rebooting cloudservices servers for T303179
- 13:07 arturo: [codfw1dev] rebooting cloudvirt servers for T303179
- 13:06 arturo: [codfw1dev] rebooting cloudnet servers for T303179
- 12:55 arturo: [codfw1dev] rebooting cloudcontrol servers for T303179
2022-03-03
- 08:49 taavi: deploying cloudmetrics grafana to grafana 8, T282863
2022-03-02
- 09:06 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/701347
2022-02-28
- 15:30 dcaro: cleaning up leftover snapshots from failed backups of the maps volume (T302720)
2022-02-24
- 17:04 andrewbogott: upgrading eqiad1 and codfw1dev to mariadb 10.5.15+maria~bullseye via 'apt-get install libmariadb3:amd64 galera-4 mariadb-server'
- 15:42 dcaro: stopping and starting mariadb on cloudcontrol1003 (T302146)
- 10:37 arturo: [codfw1dev] briefly installed galera-4 (26.4.11+1bullseye) over (26.4.9-0+deb11u1) on cloudcontrol2001-dev and then downgrade again to verify package install (T302482)
2022-02-23
- 20:39 taavi: added domain-wide 'designateadmin' and 'observer' roles to project-proxy-dns-manager service account T295246
- 17:40 andrewbogott: restarting lots of openstack services to try to clear up the mess that is T236101
- 12:13 arturo: cleaning up cinder volume snapshots, aborrero@cloudcontrol1005:~$ for i in $(sudo wmcs-openstack volume snapshot list -f value -c ID) ; do sudo wmcs-openstack volume snapshot delete $i ; done (T302382)
- 10:14 arturo: cleaning up neutron agents for non-existent servers cloudvirt100[1-9].eqiad.wmnet,cloudvirt10[12-15].eqiad.wmnet
- 10:05 dcaro: Deleting stuck novafullstack servers, to let the service create new ones (T302369)
- 09:56 arturo: neutron agent-delete bad663b3-fd25-4393-a546-4b1b4bdec4db (Linux bridge agent | cloudvirtan1001)
- 09:56 arturo: neutron agent-delete 1071c198-ed57-4b5a-9439-30e66a31aa69 (Linux bridge agent | cloudvirtan1005)
- 09:55 arturo: neutron agent-delete 2eeef198-8af7-4e5d-bd73-e14a2a8d2404 (Linux bridge agent | cloudvirtan1004)
- 09:55 arturo: neutron agent-delete afe173eb-35ba-444a-9960-899629786d2f (Linux bridge agent | cloudvirtan1003)
- 09:54 arturo: neutron agent-delete afcb9b7f-c1a6-4ff4-9b10-92bfbe8d1a56 (Linux bridge agent | cloudvirtan1002)
- 09:39 dcaro: restarting neutron-api cloudcontrol1003 to see if the agent status update starts working (T302369)
- 09:38 dcaro: restarting neutron-dhcp-agent on cloudnet1003 (T302369)
2022-02-22
- 22:10 andrewbogott: raising project 'maps' quota by two tb -- T300160
- 09:24 arturo: restarting mariadb @ cloudcontrol1003 (T302146)
- 09:13 arturo: restarting mariadb @ cloudcontrol1004 (T302146)
2022-02-18
- 21:57 andrewbogott: leaving cloudcontrol1003 downtimed with disabled puppet for the weekend. Everything there should be stable and fine save rabbit which needs an upgrade.
- 21:30 andrewbogott: rebooting cloudcontrol1003 because rabbit is freaking out
- 17:25 andrewbogott: in-place upgrade of cloudcontrol1004 to bullseye -- T281276
- 12:34 arturo: manually install prometheus-openstack-exporter on cloudcontrol1005 (T302050)
2022-02-17
- 23:02 andrewbogott: in-place upgrade to Bullseye on cloudcontrol1005 T281276
2022-02-15
- 14:15 taavi: [codfw1dev] added domain-wide 'designateadmin' and 'observer' roles to codfw1dev-proxy-dns-manager service account T295246
2022-02-04
- 10:12 arturo: restart backup_vms service in cloudvirt1024 (T300956)
2022-02-03
- 08:21 taavi: cloudmetrics1004: manually added an empty line to /etc/prometheus/blackbox.yml to make /usr/local/bin/blackbox-exporter-assemble happy (clearing "performing a change every puppet run" alert)
2022-02-02
- 02:36 andrewbogott: restarting mariadb on cloudcontrol1004
2022-01-31
- 10:15 arturo: cloudcontrol1005:~$ sudo systemctl restart backup_glance_images.service (failed state, no logs, icinga alert)
2022-01-29
- 18:24 taavi: delete 2 puppet prefixes in a weird state T299750
2022-01-27
- 13:24 arturo: cloudmetrics1004:~ $ sudo systemctl restart wmcs_monitoring_graphite_rsync.service (T300138)
2022-01-26
- 19:09 andrewbogott: bootstrapping a fresh galera node on cloudcontrol1004
- 18:57 andrewbogott: restarting mariadb on cloudcontrol1004
2022-01-25
2022-01-19
- 16:38 andrewbogott: moving all scratch mounts to scratch.svc.cloudinfra-nfs.eqiad1.wikimedia.cloud
2022-01-05
- 03:11 andrewbogott: 'cp /etc/apt/sources.list /etc/apt/sources.list.prepuppet' on all VMs. Backing up state before puppetizing sources.list with https://gerrit.wikimedia.org/r/c/operations/puppet/+/751498
2022-01-04
- 12:44 dcaro: increasing the size_limit for labs ldap servers
2021-12-26
- 16:55 majavah: run attachLdapUser.php on wikitech for developer account "Karthiksripal"
2021-12-24
- 22:51 majavah: ran the wikireplica dns script on s5 T298303
2021-12-23
- 21:42 majavah: deployed horizon wmf-proxy-dashboard update to fix editing of existing proxies
2021-12-21
- 10:39 arturo: dropped egress NAT exceptions for WMF apt repos, T298042
2021-12-15
- 12:44 dcaro: Downtiming cloudvirt-wdqs1001 as it has no VMs running until disk space is fixed (T297454)
2021-12-14
- 10:26 dcaro: Moved the nova cache (/var/lib/nova/instances/_base) and the canary image local data (/var/lib/nova/instance/<canary_image_id>) to the root disk on cloudvirt-wdqs1001 to temporary free some space (T297454)
2021-12-13
- 18:08 wm-bot: Drained 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse
- 17:50 wm-bot: Set cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 17:49 wm-bot: Setting cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 17:49 wm-bot: Draining 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse
- 17:44 wm-bot: Drained 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse
- 17:30 wm-bot: Set cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 17:30 wm-bot: Setting cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 17:30 wm-bot: Draining 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse
- 17:13 wm-bot: Drained 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse
- 16:50 wm-bot: Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 16:47 wm-bot: Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 16:47 wm-bot: Draining 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse
- 16:44 wm-bot: Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
- 16:43 wm-bot: Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse
2021-12-03
- 18:56 andrewbogott: maintain-views and maintain-meta-p on clouddb1013-1020
- 10:49 majavah: deleting dbbackups-dashboard project T296992
2021-12-02
- 01:17 wm-bot: Drained 'cloudvirt1028.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
- 00:56 wm-bot: Set cloudvirt 'cloudvirt1028.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:56 wm-bot: Setting cloudvirt 'cloudvirt1028.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:56 wm-bot: Draining 'cloudvirt1028.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
- 00:50 wm-bot: Set cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:50 wm-bot: Setting cloudvirt 'cloudvirt1026.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:50 wm-bot: Draining 'cloudvirt1026.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
- 00:28 wm-bot: Drained 'cloudvirt1021.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
- 00:03 wm-bot: Set cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:02 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 00:02 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
2021-12-01
- 23:59 wm-bot: Setting cloudvirt 'cloudvirt1021.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster
- 23:59 wm-bot: Draining 'cloudvirt1021.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
- 23:54 andrewbogott: *correction* adding spare cloudvirts 1044 and 1045 to the 'ceph' pool in order to make space for future juggling around T296790 and T296792
- 23:53 andrewbogott: adding spare cloudvirts 1044 and 1055 to the 'ceph' pool in order to make space for future juggling around T296790 and T296792
2021-11-28
- 17:48 andrewbogott: moved cloudvirt1018 out of the 'localstorage' aggregate and into 'maintenance' for T296592. It will need to be moved back after the raid is rebuilt.
2021-11-21
- 07:19 dcaro_away: restarting designate-sink with some extra logs in it (T296144)
2021-11-17
- 15:48 andrewbogott: upgrading mariadb packages on eqiad1 cloudcontrols
- 15:39 andrewbogott: sudo cumin "cloud*" 'apt-get update -y --allow-releaseinfo-change'
- 15:26 andrewbogott: updated mariadb packages on codfw1dev cloudcontrols to 1:10.3.31-0+deb10u1
2021-11-12
- 13:31 arturo: restarting glance-api services to make sure they work with new ceph auth creds (T293752)
2021-11-08
- 21:50 andrewbogott: returned clouddb pools back to normal after maintain_views run: https://gerrit.wikimedia.org/r/c/operations/puppet/+/737505 T216481
- 20:07 andrewbogott: depooling clouddb1013 for maintain_views attempt
- 10:54 arturo: [codfw1dev] create service account `srv-networktests` following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for T294955
- 10:34 arturo: create service account `srv-networktests` following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for T294955
2021-11-05
- 11:18 wm-bot: Added 1 new OSDs ['cloudcephosd1024.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
- 11:17 wm-bot: Added OSD cloudcephosd1024.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 11:15 wm-bot: Finished rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by arturo@endurance
- 11:12 wm-bot: Rebooting node cloudcephosd1024.eqiad.wmnet - cookbook ran by arturo@endurance
- 11:12 wm-bot: Adding OSD cloudcephosd1024.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 11:12 wm-bot: Adding new OSDs ['cloudcephosd1024.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
2021-11-04
- 16:39 wm-bot: Added 1 new OSDs ['cloudcephosd1023.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
- 16:39 wm-bot: Added OSD cloudcephosd1023.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 16:37 wm-bot: Finished rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by arturo@endurance
- 16:34 wm-bot: Rebooting node cloudcephosd1023.eqiad.wmnet - cookbook ran by arturo@endurance
- 16:33 wm-bot: Adding OSD cloudcephosd1023.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 16:33 wm-bot: Adding new OSDs ['cloudcephosd1023.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
- 16:17 wm-bot: Added 1 new OSDs ['cloudcephosd1022.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
- 16:17 wm-bot: Added OSD cloudcephosd1022.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 16:16 wm-bot: Finished rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by arturo@endurance
- 16:13 wm-bot: Rebooting node cloudcephosd1022.eqiad.wmnet - cookbook ran by arturo@endurance
- 16:12 wm-bot: Adding OSD cloudcephosd1022.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 16:12 wm-bot: Adding new OSDs ['cloudcephosd1022.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
- 16:00 wm-bot: Adding OSD cloudcephosd1022.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 16:00 wm-bot: Adding new OSDs ['cloudcephosd1022.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
- 11:26 wm-bot: Added 1 new OSDs ['cloudcephosd1021.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
- 11:26 wm-bot: Added OSD cloudcephosd1021.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 11:23 wm-bot: Finished rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by arturo@endurance
- 11:20 wm-bot: Rebooting node cloudcephosd1021.eqiad.wmnet - cookbook ran by arturo@endurance
- 11:19 wm-bot: Adding OSD cloudcephosd1021.eqiad.wmnet... (1/1) (T295012) - cookbook ran by arturo@endurance
- 11:19 wm-bot: Adding new OSDs ['cloudcephosd1021.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
- 11:16 wm-bot: Adding new OSDs ['cloudcephosd1021.eqiad.wmnet'] to the cluster (T295012) - cookbook ran by arturo@endurance
2021-11-03
- 17:22 arturo: [codfw1dev] installing keepalived 2.1.5 from buster-backports on cloudgw2001-dev/2002-dev (T294956)
- 11:45 arturo: [codfw1dev] downgrade kernel on cloudgw2001-dev/2002-dev (T294853, T291813)
2021-11-02
- 10:54 arturo: rebooting cloudnet1004/1003 for T291813
- 10:43 arturo: [codfw1dev] rebooting cloudgw200[12]-dev for T291813
2021-10-24
- 00:47 andrewbogott: deploying a change so that openstack clients use tls endpoints: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732738
2021-10-21
- 10:19 arturo: drop firewall exception on core routers for wiki replicas legacy setup (T293897)
- 10:12 arturo: drop NAT exception for wiki replicas legacy setup (T293897)
2021-10-20
- 21:06 andrewbogott: creating cloudinfra-nfs project T293936
2021-10-18
- 19:21 andrewbogott: also ticked the 'admin' box on wikitech for majavah T292827
- 18:58 andrewbogott: granting majavah 'admin' role in the 'admin' project and also in the default domain. T292827
2021-10-14
- 12:28 arturo: [codfw1dev] add DB grants for cloudbackup2002.codfw.wmnet IP address to the cinder DB (T292546)
2021-10-13
- 10:46 arturo: updating python3-neutron across the fleet (T292936)
2021-10-12
- 09:06 dcaro: upgrading eqiad cloudnet hosts neutron packages (T292936)
- 08:57 dcaro: upgrading codfw cloudnet hosts neutron packages (T292936)
2021-10-05
- 09:39 arturo: [codfw1dev] cleaning up manila stuff from openstack (db, endpoints, tenant, VMs, and such) T291257
2021-09-30
- 14:50 andrewbogott: sudo cumin "cloud*" "ps -ef | grep nslcd && service nslcd restart" and sudo cumin "lab*" "ps -ef | grep nslcd && service nslcd restart" T292202
- 14:43 andrewbogott: ran sudo cumin --force --timeout 500 -o json "A:all" "ps -ef | grep nslcd && service nslcd restart" to get nslcd happy again T292202
2021-09-29
- 09:41 arturo: [codfw1dev] cleanup manila shares definitions for a clean start now that the manila-sharecontroller VM is apparently well configured (T291257)
2021-09-28
- 16:23 bstorm: downtime for clouddb1020 to reduce re-pages in case this goes badly T291963
- 16:21 bstorm: powering on clouddb1020 via remote console T291963
- 15:58 bstorm: depooled clouddb1020 for repair T291961
- 12:40 dcaro: Merged change on sssd for bullseye cloud hosts (T291585)
- 11:30 arturo: [codfw1dev] create floating IP 185.15.57.5 for manila-sharecontroller.cloudinfra-codfw1dev.codfw1dev.wmcloud.org (T291257)
2021-09-27
- 10:07 arturo: cloudcontrol1004 apparently healthy T291446
- 09:25 arturo: rebooting cloudcontrol1004 for T291446
2021-09-24
- 13:02 arturo: [codfw1dev] create VM manila-share-controller-01 on cloudinfra-codfw1dev
- 13:00 arturo: [codfw1dev] rebase labs/private.git on cloudinfra-puppetmaster-01, had merge conflict
2021-09-21
- 12:13 arturo: [codfw1dev] trying to create a manila service image (T291257)
- 11:45 arturo: [codfw1dev] created rabbitmq user (T291257)
- 11:32 arturo: [codfw1dev] populated manila DB & created service endpoints (T291257)
- 11:06 arturo: [codfw1dev] give manila user admin role @ manila project (T291257)
- 11:06 arturo: [codfw1dev] created manila project (T291257)
- 10:57 arturo: [codfw1dev] created manila user @ labtestwikitech (T291257)
- 10:49 arturo: [codfw1dev] create manila database on cloudcontrol-dev nodes (galera) T291257
2021-09-20
- 23:08 bstorm: ran `echo check > /sys/block/md0/md/sync_action` on cloudcontrol1004 to check raid
- 22:48 andrewbogott: stopped puppet & mariadb on cloudcontrol1004; it was flapping
- 22:44 andrewbogott: sudo touch /tmp/galera.disabled on cloudcontrol1004, the service seems troubled there
- 21:57 andrewbogott: moving cloudvirt1043 into the 'nfs' aggregate for T291405
2021-09-17
- 11:35 arturo: [codfw1dev] install manila on cloudcontrol2001-dev (T291257)
2021-09-16
- 15:56 bstorm: removing downtime for labstore1005 so we'll know if it has another issue T290318
2021-09-09
- 22:03 bstorm: restarted the prometheus-mysqld-exporter@s1 service as it was not working T290630
- 03:15 bstorm: resetting swap on clouddb1017 T290630
- 03:08 andrewbogott: stopping maintain-dbusers on labstore1004 for help diagnosing T290630
2021-09-03
- 15:34 bstorm: rebooting labstore1005 to disconnect the drives from labstore1004 T290318
- 15:24 bstorm: stopping puppet and disabling backup syncs to labstore1005 on cloudbackup2002 T290318
- 15:20 bstorm: stopping puppet and disabling backup syncs to labstore1005 on cloudbackup2001 T290318
2021-08-30
- 16:16 wm-bot: Added 1 new OSDs ['cloudcephosd1018.eqiad.wmnet'] - cookbook ran by andrew@buster
- 16:16 wm-bot: Added OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster
- 16:13 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster
- 16:13 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 16:10 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
- 16:07 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
- 16:07 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster
- 16:07 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
2021-08-27
- 18:57 andrewbogott: raising toolsbeta ram/core/instances quotas so majavah can experiment with bullseye
2021-08-25
- 14:45 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
- 14:42 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
- 14:42 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster
- 14:42 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 14:41 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
2021-08-19
- 17:39 bstorm: restarting glance image backup to try and clear the page
2021-08-18
- 16:21 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
- 16:21 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) - cookbook ran by andrew@buster
- 16:21 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 16:17 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 16:16 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 16:15 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 16:13 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster - cookbook ran by andrew@buster
- 14:47 andrewbogott: adding clouvirt1038 to the ceph aggregate, removing from the maintenance aggregate T276922
2021-08-17
- 15:11 andrewbogott: rebooting cloudcephosd1008 to force raid rebuild -- T287838
2021-08-11
- 13:51 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:48 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 13:47 wm-bot: Adding OSD cloudcephosd1018.eqiad.wmnet... (1/1) (T285858) - cookbook ran by dcaro@vulcanus
- 13:47 wm-bot: Adding new OSDs ['cloudcephosd1018.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
2021-08-10
- 15:15 andrewbogott: restarting all designate services in eqiad1
- 15:04 andrewbogott: restarting designate-sink in eqiad1; it's complaining about rabbit but I don't want to restart rabbit yet
2021-08-05
- 09:37 dcaro: Taking one osd daemon down ot codfw cluster (T288203)
2021-08-04
- 19:20 bd808: Running deleteBatch.php on cloudweb2001-dev to remove legacy Heira: pages from labtestwiki
2021-08-03
- 17:40 bstorm: rerunning the glance backup script after failure
2021-07-31
- 00:10 andrewbogott: "systemctl reset-failed cloud-init.service" on all VMs for T287309
- 00:08 andrewbogott: "systemctl reset-failed cloud-final.service" on all VMs for T287309
2021-07-27
- 21:32 andrewbogott: putting cloudvirt1012 back into service T286748
- 20:52 andrewbogott: draining VMs off of cloudvirt1012 so we can replace the battery for T286748
- 15:15 andrewbogott: "rm /etc/apt/sources.list.d/openstack-mitaka-jessie.list" cloud-wide
2021-07-23
- 15:22 bstorm: update wikireplicas-dns for s7 fix for web replicas
2021-07-20
- 17:07 andrewbogott: reloading haproxy on dbproxy1018 for T286598
- 15:45 arturo: failback from labstore1006 to labstore1007 (dumps NFS) https://gerrit.wikimedia.org/r/c/operations/puppet/+/705417
- 00:10 bstorm: restarting nova-api on cloudcontrol1003 to try and recover whatever it's doing with designate_floating_ip_ptr_records_updater
2021-07-19
- 22:05 bstorm: set downtime scheduled for tomorrow from 1300 to 1600 UTC for cloudstore1008 and 1009 T286599
- 20:40 andrewbogott: reloading haproxy on dbproxy1018 for T286598
- 13:50 andrewbogott: upgrading mariadb to 10.3.29 on all cloudcontrols
2021-07-16
- 09:55 dcaro: checking HP raid issues on coludvirt1012 (T286766)
2021-07-14
- 21:08 andrewbogott: restarting lots of openstack services while trying to resolve T286675
- 12:17 dcaro: doing ceph outage tests on codfw1 (fyi)
2021-07-13
- 10:57 dcaro: enabled autoscaling on codfw1 ceph cluster, setting a minimum of pgs on codfw1dev-compute to 128
2021-07-02
- 10:12 wm-bot: The cluster is not rebalance after adding the new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] (T285858) - cookbook ran by dcaro@vulcanus
- 10:12 wm-bot: Added 2 new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] (T285858) - cookbook ran by dcaro@vulcanus
- 10:12 wm-bot: Added OSD cloudcephosd1020.eqiad.wmnet... (2/2) (T285858) - cookbook ran by dcaro@vulcanus
- 10:10 wm-bot: Finished rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 10:07 wm-bot: Rebooting node cloudcephosd1020.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 10:07 wm-bot: Adding OSD cloudcephosd1020.eqiad.wmnet... (2/2) (T285858) - cookbook ran by dcaro@vulcanus
- 10:07 wm-bot: Added OSD cloudcephosd1019.eqiad.wmnet... (1/2) (T285858) - cookbook ran by dcaro@vulcanus
- 10:05 wm-bot: Finished rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 10:02 wm-bot: Rebooting node cloudcephosd1019.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 10:02 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (1/2) (T285858) - cookbook ran by dcaro@vulcanus
- 10:01 wm-bot: Adding new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 09:13 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (1/2) (T285858) - cookbook ran by dcaro@vulcanus
- 09:13 wm-bot: Adding new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
2021-07-01
- 16:27 bstorm: failed over cloudstore1009 to cloudstore1008 T224747
- 16:18 bstorm: downtimed cloudstore1008 and cloudstore1009 to fail over T224747
- 14:25 wm-bot: Adding OSD cloudcephosd1019.eqiad.wmnet... (2/3) (T285858) - cookbook ran by dcaro@vulcanus
- 14:25 wm-bot: Added OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
- 14:24 wm-bot: Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:21 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:20 wm-bot: Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
- 14:20 wm-bot: Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 14:18 wm-bot: Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
- 14:17 wm-bot: Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
- 14:17 wm-bot: Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 11:16 wm-bot: Added new OSD node cloudcephosd1016.eqiad.wmnet (T285858) - cookbook ran by dcaro@vulcanus
- 11:13 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 10:58 dcaro: rebooting cloudcephosd1016 (T285858)
- 10:47 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 10:44 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 10:42 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 10:41 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
- 10:40 wm-bot: Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
2021-06-30
- 21:48 bstorm: downtimed space alerts for scratch on cloudstore1008 until after the migration
2021-06-25
- 15:28 andrewbogott: restarting openstack services on cloudcontrol1005
- 09:16 arturo: icinga downtime cloudcontrols for 2h
- 08:20 dcaro: restarting rabbitmq on cloudcontrol100{3,4}
2021-06-21
- 13:54 dcaro: puppet fix merged and deployed, servers are back to normal
- 13:20 dcaro: merged broken puppet patch, downtimed all cloudvirts for 2h while fixing (nothing big, just added a bad systemd timer)
2021-06-20
- 22:21 andrewbogott: clearing admin-monitoring VMs; puppet has been failing lately due to a full drive on the puppetmaster
2021-06-15
- 01:18 bstorm: running a modified version of the prometheus dir size cron in screen T284964
2021-06-14
- 10:13 dcaro: setting ssd to debug mode on tools-sgeexec-0917 (T284130)
2021-06-10
- 10:58 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 10:58 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:55 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:55 wm-bot: Finished rebooting node cloudcephmon2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:52 wm-bot: Rebooting node cloudcephmon2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:52 wm-bot: Finished rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:49 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:49 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev (T281248) - cookbook ran by dcaro@vulcanus
- 10:48 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 10:48 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:45 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:45 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:42 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:42 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:39 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 10:39 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev (T281248) - cookbook ran by dcaro@vulcanus
- 09:39 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 09:38 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:35 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:35 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:32 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:32 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:29 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:29 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev (T281248) - cookbook ran by dcaro@vulcanus
- 09:26 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:26 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev (T281248) - cookbook ran by dcaro@vulcanus
- 09:24 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 09:24 wm-bot: Rebooting the nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev (T281248) - cookbook ran by dcaro@vulcanus
2021-06-09
- 17:33 arturo: removed icinga downtime for cloudmetrics1002 -- to see if hardware is healthy (T281881)
- 13:30 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 13:30 wm-bot: Finished rebooting node cloudcephmon2004-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:27 wm-bot: Rebooting node cloudcephmon2004-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:27 wm-bot: Finished rebooting node cloudcephmon2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:24 wm-bot: Rebooting node cloudcephmon2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:24 wm-bot: Finished rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:21 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:21 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev (T281248) - cookbook ran by dcaro@vulcanus
- 13:01 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 13:01 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev (T281248) - cookbook ran by dcaro@vulcanus
- 12:53 wm-bot: Rebooting node cloudcephmon2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 12:53 wm-bot: Rebooting the nodes cloudcephmon2002-dev,cloudcephmon2003-dev,cloudcephmon2004-dev (T281248) - cookbook ran by dcaro@vulcanus
2021-06-08
- 23:19 bd808: Downtimed cloudmetrics1002 in icinga until 2021-06-30 23:59:01 (T281881)
- 21:08 bstorm: downtiming grafana-labs for maintenance
- 16:28 wm-bot: Finished rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 16:27 wm-bot: Finished rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:24 wm-bot: Rebooting node cloudcephosd2003-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:24 wm-bot: Finished rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:22 wm-bot: Rebooting node cloudcephosd2002-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:21 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:18 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet (T281248) - cookbook ran by dcaro@vulcanus
- 16:18 wm-bot: Rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 16:17 wm-bot: Rebooting the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] (T281248) - cookbook ran by dcaro@vulcanus
- 15:03 wm-bot: Finished rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:59 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:59 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:57 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:57 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:29 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:23 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
- 14:18 wm-bot: Rebooting node cloudcephosd2001-dev.codfw.wmnet - cookbook ran by dcaro@vulcanus
2021-06-07
- 14:27 andrewbogott: moving cloudvirt1040 from 'maintenance' aggregate to 'ceph' aggregate T281399
2021-06-01
- 13:12 dcaro: Changed the ceph osd_memory_target on eqiad pool to 6Gi (we were reaching the limit, swapping at some points)
- 09:57 arturo: fix PTR record for 185.15.56.1 (T284025)
- 09:56 arturo: fix PTR record for 185.15.56.1 (T248025)
2021-05-27
- 14:58 wm-bot: Testing - cookbook ran by dcaro@vulcanus
2021-05-26
- 19:10 andrewbogott: reimaging cloudvirt1018 to support local VM storage
- 18:07 andrewbogott: draining cloudvirt1018, converting it to a local-storage host like cloudvirt1019 and 1020 -- T283296
- 14:36 dcaro: Enabled syslog logging for osd.55 on eqiad ceph cluster for testing (T281247)
- 14:36 dcaro: Enabled syslog logging on codfw ceph cluster (mon/osd/mgr) (T281247)
- 11:26 arturo: [codfw1dev] purge old kernel packages in cloudvirt200[12]-dev
- 11:03 arturo: created public flavor `g3.cores16.ram36.disk20` (even though it was requested as private in T283293, but may be useful for others)
2021-05-25
- 16:14 bd808: Closed #wikimedia-cloud-admin on f***node
- 16:11 bd808: Closed #wikimedia-cloud-feed on f***node
- 15:19 dcaro: rebooted cloudvirt1020, starting VMs (T275893)
- 15:13 dcaro: rebooting cloudvirt1020 (T275893)
- 14:42 dcaro: taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it (T275893)
2021-05-24
- 22:32 andrewbogott: changing the default ttl for eqiad1.wikimedia.cloud. from 3600 to 60; this should help us avoid madness when re-using hostnames.
- 11:20 arturo: created `g3.cores2.ram80.disk40.private` for the wmf-research-tools project, to allow resizing a 40G disk instance
2021-05-22
- 02:14 bstorm: downtiming SMART alerts on dumps server labstore1007 for the weekend because it has been flapping T281045
2021-05-13
- 21:25 bstorm: converted the maps and scratch volumes on cloudstore1008 (standby) to drbd T224747
- 15:45 bstorm: re-running wikireplicas-dns after refactor of config to make sure it doesn't change anything
2021-05-12
- 14:23 arturo: [codfw1dev] cleanup old unused agents (bgp, ovs)
- 11:37 arturo: [codfw1dev] replacing cloudnet2003-dev with cloudnet2004-dev (T281381)
2021-05-11
- 18:00 andrewbogott: adding 'trove' service project in advance of deploying trove in eqiad1
- 10:22 arturo: rebooted cloudgw1002 (active) thus causing a failover to cloudgw1001
2021-05-09
- 10:53 arturo: icinga-downtime cloudmetrics1002 for 3 months (T275605)
2021-05-07
- 13:51 andrewbogott: add inherited 'admin' right to novaadmin user throughout eqiad1. I was trying to narrow down the rights here but lack of admin breaks some workflows, e.g. T281894 and T282235
2021-05-06
- 15:31 arturo: about to migrating CloudVPS network to the cloudgw architecture T270704
- 11:14 dcaro: restarting cinder-volume on the eqiad control nodes to refresh the ceph libraries (T282109)
2021-05-05
- 16:07 dcaro: disallowing insecure global ids on the eqiad ceph cluster (T280641)
- 15:15 wm-bot: Safe reboot of 'cloudvirt1046.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:11 wm-bot: Safe rebooting 'cloudvirt1046.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:11 wm-bot: Safe reboot of 'cloudvirt1045.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:07 wm-bot: Safe rebooting 'cloudvirt1045.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:07 wm-bot: Safe reboot of 'cloudvirt1044.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:03 wm-bot: Safe rebooting 'cloudvirt1044.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:03 wm-bot: Safe reboot of 'cloudvirt1043.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 14:59 wm-bot: Safe rebooting 'cloudvirt1043.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 14:59 wm-bot: Safe reboot of 'cloudvirt1042.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 14:40 wm-bot: Safe rebooting 'cloudvirt1042.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 14:39 wm-bot: Safe reboot of 'cloudvirt1041.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 14:14 wm-bot: Safe rebooting 'cloudvirt1041.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 14:14 wm-bot: Safe reboot of 'cloudvirt1039.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 14:10 wm-bot: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 12:35 wm-bot: Safe rebooting 'cloudvirt1039.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 11:56 wm-bot: Safe rebooting 'cloudvirt1038.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 11:56 wm-bot: Safe reboot of 'cloudvirt1037.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 11:31 wm-bot: Safe rebooting 'cloudvirt1037.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 11:31 wm-bot: Safe reboot of 'cloudvirt1036.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 11:08 wm-bot: Safe rebooting 'cloudvirt1036.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 11:08 wm-bot: Safe reboot of 'cloudvirt1035.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 10:39 wm-bot: Safe rebooting 'cloudvirt1035.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 10:39 wm-bot: Safe reboot of 'cloudvirt1034.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 10:13 wm-bot: Safe rebooting 'cloudvirt1034.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 10:13 wm-bot: Safe reboot of 'cloudvirt1033.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 09:47 wm-bot: Safe rebooting 'cloudvirt1033.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 09:47 wm-bot: Safe reboot of 'cloudvirt1032.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 09:21 wm-bot: Safe rebooting 'cloudvirt1032.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 09:21 wm-bot: Safe reboot of 'cloudvirt1031.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 08:45 wm-bot: Safe rebooting 'cloudvirt1031.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 08:45 wm-bot: Safe reboot of 'cloudvirt1030.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 08:19 wm-bot: Safe rebooting 'cloudvirt1030.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 08:19 wm-bot: Safe reboot of 'cloudvirt1029.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 08:02 wm-bot: Safe rebooting 'cloudvirt1029.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
2021-05-04
- 16:05 wm-bot: Safe reboot of 'cloudvirt1028.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:45 wm-bot: Safe rebooting 'cloudvirt1028.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:44 wm-bot: Safe reboot of 'cloudvirt1027.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:22 wm-bot: Safe rebooting 'cloudvirt1027.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:19 wm-bot: Safe reboot of 'cloudvirt1026.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:15 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 13:19 dcaro: rebooting cloudmetrics1002, got stuck again (T275605)
- 10:04 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 09:10 wm-bot: Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 09:10 wm-bot: Safe reboot of 'cloudvirt1025.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 08:34 wm-bot: Safe rebooting 'cloudvirt1025.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 08:20 wm-bot: Safe reboot of 'cloudvirt1024.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 08:03 wm-bot: Safe rebooting 'cloudvirt1024.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
2021-05-03
- 23:53 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004 T281287
- 23:51 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004
- 16:34 wm-bot: Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 16:29 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:41 wm-bot: Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 15:41 wm-bot: Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
- 15:13 wm-bot: Safe rebooting 'cloudvirt1022.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus
- 10:31 wm-bot: Safe rebooting 'cloudvirt1021.eqiad.wmnet'. (T280641 - cookbook ran by dcaro@vulcanus)
- 10:23 wm-bot: (from a cookbook)
- 09:12 dcaro: draining and rebooting coludvirt1021 (T280641)
- 08:26 dcaro: draining and rebooting coludvirt1018 (T280641)
2021-04-30
- 11:16 dcaro: draining and rebooting coludvirt1017, last one today (T280641)
- 10:37 dcaro: draining coludvirt1016 for reboot (T280641)
- 09:48 dcaro: draining coludvirt1013 for reboot (T280641)
2021-04-29
- 15:11 dcaro: hard rebooting cloudmetrics1002, got hung again (T275605)
- 07:53 dcaro: Upgrading ceph libraries on cloudcontrol1005 to octopus (T274566)
- 07:51 dcaro: Upgrading ceph libraries on cloudcontrol1003 to octopus (T274566)
- 07:50 dcaro: Upgrading ceph libraries on cloudcontrol1004 to octopus (T274566)
2021-04-28
- 21:11 andrewbogott: cleaning up more references to deleted hypervisors with delete from services where topic='compute' and version != 53;
- 20:48 andrewbogott: cleaning up references to deleted hypervisors with mysql:root@localhost [nova_eqiad1]> delete from compute_nodes where hypervisor_version != '5002000';
- 19:40 andrewbogott: putting cloudvirt1040 into the maintenance aggregate pending more info about T281399
- 18:11 andrewbogott: adding cloudvirt1040, 1041 and 1042 to the 'ceph' host aggregate -- T275081
- 11:06 dcaro: All ceph server side upgraded to Octopus! \o/ (T280641)
- 10:57 dcaro: Got a PG getting stuck on 'remapping' after the OSD came up, had to unset the norebalance and then set it again to get it unstuck (T280641)
- 10:34 dcaro: Slow/blocked opns from cloudcephmon03, "osd_failure(failed timeout osd.32..." (cloudcephosd1005), unset the cluster noout/norebalance and went away in a few secs, setting it again and continuing... (T280641)
- 09:03 dcaro: Waiting for slow heartbeats from osd.58(cloudcephosd1002) to recover... (T280641)
- 08:59 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58, currently on cloudcephosd1002 (T280641)
- 08:58 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58 (T280641)
- 08:58 dcaro: During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) (T280641)
- 08:21 dcaro: Upgrading all the ceph osds on eqiad (T280641)
- 08:21 dcaro: The clock skew seems intermittent, there's another task to follw it T275860 (T280641)
- 08:18 dcaro: All equiad ceph mons and mgrs upgraded (T280641)
- 08:18 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, cloudcephmon1001, they are back (T280641)
- 08:15 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, it went away, I'm guessing systemd-timesyncd fixed it (T280641)
- 08:14 dcaro: During the upgrade, ceph detected a clock skew on cloudcephmon1002, looking (T280641)
- 07:58 dcaro: Upgrading ceph services on eqiad, starting with mons/managers (T280641)
2021-04-27
- 14:10 dcaro: codfw.openstack upgraded ceph libraries to 15.2.11 (T280641)
- 13:07 dcaro: codfw.openstack cloudvirt2002-dev done, taking cloudvirt2003-dev out to upgrade ceph libraries (T280641)
- 13:00 dcaro: codfw.openstack cloudvirt2001-dev back online, taking cloudvirt2002-dev out to upgrade ceph libraries (T280641)
- 10:51 dcaro: ceph.eqiad: cinder pool got it's pg_num increased to 1024, re-shuffle started (T273783)
- 10:48 dcaro: ceph.eqiad: Tweaked the target_size_ratio of all the pools, enabling autoscaler (it will increase cinder pool only) (T273783)
- 09:14 dcaro: manually force stopping the server puppetmaster-01 to unblock migration (in codfw1)
- 09:14 dcaro: manually force stopping the server puppetmaster-01 to unblock migration
- 08:59 dcaro: manually force stopping the server exploding-head on codfw, to try cold migration
- 08:47 dcaro: restarting nova-compute on cloudvirt2001-dev after upgrading ceph libraries to 15.2.11
2021-04-26
- 20:56 andrewbogott: deleting spurious 'codfw1dev' and 'codw1dev-4' regions in the dallas deployment; regions without endpoints break a bunch of things
- 09:45 dcaro: draining cloudvirt2001-dev with the new cookbooks (T280641)
2021-04-23
- 13:49 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641)
- 11:12 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster (T280641)
- 09:32 dcaro: finished upgrade of ceph cluster on codfw1 using exclusively cookbooks (T280641)
- 09:17 dcaro: testing the upgrade_osds cookbook on codfw1 ceph cluster (T280641)
- 08:17 dcaro: testing the upgrade_mons cookbook on codfw1 ceph cluster (T280641)
2021-04-21
- 17:59 dcaro: all monitors upgraded on codfw1 with one cookbook `cookbook --verbose -c ~/.config/spicerack/cookbook.yaml wmcs.ceph.upgrade_mons --monitor-node-fqdn cloudcephmon2002-dev.codfw.wmnet` (T280641)
- 17:47 dcaro: upgrading monitors and mrg nodes on codfw ceph cluster (T280641)
- 13:26 dcaro: testing ceph upgrade cookbook on cloudcephmon2002-dev (T280641)
2021-04-20
- 20:21 andrewbogott: reboot cloudservices1003
- 20:13 andrewbogott: reboot cloudservices1004
2021-04-19
- 08:40 dcaro: enabling puppet on labstore1004 after mysql restart (T279657)
- 08:09 dcaro: downtiming labstore1004 and stopping puppet for mysql restart (T279657)
2021-04-14
- 10:48 dcaro: Upgrade of codfw ceph to octopus 15.2.20 done, will run some performance tests now (T274566)
- 10:41 dcaro: Upgrade of codfw ceph to octopus 15.2.20, mgrs upgraded, osds next (T274566)
- 10:37 dcaro: Upgrade of codfw ceph to octopus 15.2.20, mons upgraded, mgrs next (T274566)
- 10:15 dcaro: starting the upgrade of codfw ceph to octopus 15.2.20 (T274566)
- 10:07 dcaro: Merged the ceph 15 (Octopus) repo deployment to codfw, only the repo, not the packages (T274566)
2021-04-13
- 16:42 dcaro: Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ (T274573)
- 15:08 dcaro: Activating continuous upmap balancer, keeping a close eye (T274573)
- 15:03 dcaro: Executing a second pass, there's still movements to improve the eval of 0.030075 (T274573)
- 15:02 dcaro: First pass finished, improved eval to 0.030075 (T274573)
- 14:49 dcaro: Running the first_pass balancing plan on ceph eqiad, current eval 0.030622 (T274573)
- 14:43 dcaro: enabling ceph upmap pg balancer on equiad (T274573)
- 14:36 andrewbogott: upgrading codfw1dev to version Victoria, T261137
- 13:11 andrewbogott: upgrading eqiad1 designate to version Victoria, T261137
- 10:44 dcaro: enabled ceph upmap balancer on codfw (T274573,T274573)
2021-04-07
- 21:33 andrewbogott: upgrading codfw1dev designate to Victoria
2021-04-04
- 17:36 andrewbogott: upgrading eqiad1 designate to Ussuri
2021-04-02
- 14:12 andrewbogott: upgrading codfw1dev to OpenStack version Ussuri
2021-04-01
- 12:15 dcaro: Restoring the 4.9 kernel on cloudcephosd2003-dev and upgrading (T274565)
- 10:29 dcaro: Done restoring the 4.9 kernel on cloudcephosd2001-dev and upgrading, requires logging into console to boot from the older kernel before removing the newer one (T274565)
- 10:10 dcaro: Restoring the 4.9 kernel on cloudcephosd2001-dev and upgrading (T274565)
2021-03-31
- 08:47 dcaro: upgrading cinder on codfw cloudcontrol2* nodes (T278845)
2021-03-30
- 09:53 arturo: rebooting cloudnet1003 to cleanup conntrack table, it wouldn't cleanup by hand ...
2021-03-28
- 15:42 andrewbogott: updated debian-10.0-buster base image
2021-03-27
- 09:54 arturo: cleanup conntrack table in qrouter nents in cloudnet1003 (backup)
2021-03-25
- 19:03 andrewbogott: deleting all unused (per wmcs-imageusage) Jessie base images from Glance
- 17:15 andrewbogott: refreshing puppet compiler facts for tools project
- 10:31 dcaro: kernel upgrade on osds on codfw done, running performance tests (T274565)
- 10:24 dcaro: upgrading kernel on cloudcephosd2003-dev and reboot (T274565)
- 10:18 dcaro: upgrading kernel on cloudcephosd2002-dev and reboot (T274565)
- 10:08 dcaro: upgrading kernel on cloudcephmon2003-dev and reboot (T274565)
2021-03-24
- 09:19 dcaro: restarted wmcs-backup on cloudvirt1024 as it failed due to an image being removed while running (T276892)
2021-03-23
- 11:33 arturo: root@cloudcontrol1005:~# wmcs-novastats-dnsleaks --delete
2021-03-22
- 10:10 arturo: cleanup conntrack table in standby node: aborrero@cloudnet1003:~ $ sudo ip netns exec qrouter-d93771ba-2711-4f88-804a-8df6fd03978a conntrack -F
2021-03-19
- 17:18 bstorm: running `ALTER TABLE account MODIFY COLUMN type ENUM('user','tool','paws');` against the labsdbaccounts database on m5 T276284
- 14:29 andrewbogott: switching admin-monitoring project to use an upstream debian image; I want to see how this affects performance
- 00:30 bstorm: downtimed labstore1004 to check some things in debug mode
2021-03-17
- 17:28 bstorm: restarted the backup-glance-images job to clear errors in systemd T271782
- 17:16 andrewbogott: set default cinder quota for projects to 80Gb with "update quota_classes set hard_limit=80 where resource='gigabytes';" on database 'cinder'
- 16:58 andrewbogott: disabling all flavors with >20Gb root storage with "update flavors set disabled=1 where root_gb>20;" in nova_eqiad1_api
2021-03-10
- 16:51 arturo: rebooting cloudvirt1030 for T275753
- 13:14 dcaro: starting manually the canary VM for cloudvirt1029 (nova start 349830f6-3b39-4a8c-ada4-a7439f65cffe) (T275753)
- 12:51 arturo: draining cloudvirt1030 for T275753
- 12:47 arturo: rebooting cloudvirt1029 for T275753
- 11:56 arturo: [codfw1dev] restart rabbitmq-server in all 3 cloudcontrol servers for T276964
- 11:53 arturo: [codfw1dev] restart nova-conductor in all 3 cloudcontrol servers for T276964
- 11:31 arturo: draining cloudvirt1029 for T275753
- 11:29 arturo: rebooting cloudvirt1013 for T275753
- 11:05 arturo: draining cloudvirt1013 for T275753
- 11:00 arturo: rebooting cloudvirt1028 for T275753
- 10:33 arturo: draining cloudvirt1028 for T275753
- 10:29 arturo: rebooting cloudvirt1023 for T275753
- 09:37 arturo: draining cloudvirt1023 for T275753
- 09:07 arturo: [codfw1dev] reimaging cloudvirt2003-dev (T276964)
2021-03-09
- 16:27 arturo: rebooting cloudvirt1027 (T275753)
- 13:39 arturo: draining cloudvrit1027 for T275753
- 13:35 arturo: icinga-downtime cloudvirt1038 for 30 days for T276922
- 13:21 arturo: add cloudvirt1039 to the ceph host aggregate (no longer a spare, we have cloudvirt1038 with HW failures)
- 12:52 arturo: cloudvirt1038 hard powerdown / powerup for T276922
- 12:33 arturo: rebooting cloudvirt1038 (T275753)
- 10:58 arturo: draining cloudvirt1038 (T275753)
- 10:54 arturo: rebooting cloudvirt1037 (T275753)
- 09:59 arturo: draining cloudvirt1037 (T275753)
- 09:12 dcaro: restarted the wmcs-backup service on cloudvirt1024 to retry the backups (failed because a VM was removed in-between, T276892)
2021-03-05
- 21:40 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1 T276018
- 21:21 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1
- 16:23 arturo: rebooting cloudvirt1036 for T275753
- 12:30 arturo: draining cloudvirt1036 for T275753
- 12:25 arturo: rebooting cloudvirt1035 for T275753
- 10:49 arturo: rebooting cloudvirt1035 for T275753
- 10:47 arturo: rebooting cloudvirt1034 for T275753
- 10:26 arturo: draining cloudvirt1034 for T275753
- 10:25 arturo: rebooting cloudvirt1033 for T275753
- 09:18 arturo: draining cloudvirt1033 for T275753
2021-03-04
- 18:36 andrewbogott: rebooting cloudmetrics1002; the console is hanging
- 16:59 arturo: rebooting cloudvirt1032 for T275753
- 16:34 arturo: draining cloudvirt1032 for T275753
- 16:33 arturo: rebooting cloudvirt1031 for T275753
- 16:11 arturo: draining cloudvirt1031 for T275753
- 16:09 arturo: rebooting cloudvirt1026 for T275753
- 15:57 arturo: draining cloudvirt1026 for T275753
- 15:55 arturo: rebooting cloudvirt1025 for T275753
- 15:41 arturo: draining cloudvirt1025 for T275753
- 15:12 arturo: rebooting cloudvirt1024 for T275753
- 11:29 arturo: draining cloudvirt1024 for T275753
- 11:24 dcaro: rebooted cloudvirt1022, re-adding to ceph and removing from maintenance host aggregate for T275753
- 11:01 dcaro: rebooting cloudvirt1022 for T275753
- 09:12 dcaro: draining cloudvirt1022 for T275753
2021-03-03
- 17:16 andrewbogott: restarting rabbitmq-server on cloudcontrol1003,1004,1005; trying to explain amqp errors in scheduler logs
- 16:03 dcaro: draining cloudvirt1022 for T275753
- 16:03 dcaro: draining cloudvirt1022 for T275753
- 16:00 arturo: move cloudvirt1013 into the 'toobusy' host aggregate, it has 221% cpu subscription and 82% MEM subscription
- 15:34 arturo: rebooting cloudvirt1021 for T275753
- 14:31 arturo: draining cloudvirt1021 for T275753
- 13:59 arturo: rebooting cloudvirt1018 for T275753
- 13:28 arturo: draining cloudvirt1018 for T275753
- 12:49 arturo: rebooting cloudvirt1017 for T275753
- 12:22 arturo: draining cloudvirt1017 for T275753
- 12:20 arturo: rebooting cloudvirt1016 for T275753
- 12:01 arturo: draining cloudvirt1016 for T275753
- 11:59 arturo: cloudvirt1014 now in the ceph host aggregate
- 11:58 arturo: rebooting cloudvirt1014 for T275753
- 11:50 arturo: moved cloudvirt1023 away from the maintenance host aggregate, leave it in the ceph aggregate (was in the 2)
- 11:47 arturo: moved cloudvirt1014 to the 'maintenance' host aggregate, drain it for T275753
- 10:01 arturo: icinga-downtime cloudnet1003 for 14 days bc potential alerting storm due to firmware issues (T271058)
- 10:01 arturo: rebooting again cloudnet1003 (no network failover) (T271058)
- 09:59 arturo: update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1003 (T271058)
- 09:30 arturo: installing linux kernel 5.10.13-1~bpo10+1 in cloudnet1003 and rebooting it (network failover) (T271058)
2021-03-02
- 17:16 andrewbogott: rebooting cloudvirt1039 to see if I can trigger T276208
- 16:10 arturo: [codfw1dev] restart nova-compute on cloudvirt2002-dev
- 11:59 arturo: moved cloudvirt1012 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for T275753
- 11:59 arturo: cloudvirt1023 is affected by T276208 and cannot be rebooted. Put it back into the ceph hos aggregate
- 10:43 arturo: moved cloudvirt1013 cloudvirt1032 cloudvirt1037 back into the 'ceph' host aggregate
- 10:13 arturo: moved cloudvirt1023 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for T275753
2021-03-01
- 20:12 andrewbogott: removing novaadmin from all projects save 'admin' for T274385
- 19:51 andrewbogott: removing novaobserver from all projects save 'observer' for T274385
- 19:50 andrewbogott: adding inherited domain-wide roles to novaadmin and novaobserver as per T274385
2021-02-28
- 04:54 andrewbogott: restarted redis-server on tools-redis-1003 and tools-redis-1004 in an attempt to reduce replag, no real change detected
2021-02-27
- 00:33 andrewbogott: sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && reboot'
- 00:28 andrewbogott: sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && echo reboot'
- 00:09 andrewbogott: sudo cumin "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i stretch && uname -r | grep -v 4.19.0-0.bpo.14-amd64 && reboot'
2021-02-26
- 14:58 dcaro: [eqiad] rebooting cloudcephosd1015 (last osd \o/) for kernel upgrade (T275753)
- 14:51 dcaro: [eqiad] rebooting cloudcephosd1014 for kernel upgrade (T275753)
- 14:44 dcaro: [eqiad] rebooting cloudcephosd1013 for kernel upgrade (T275753)
- 14:38 dcaro: [eqiad] rebooting cloudcephosd1012 for kernel upgrade (T275753)
- 14:31 dcaro: [eqiad] rebooting cloudcephosd1011 for kernel upgrade (T275753)
- 14:25 dcaro: [eqiad] rebooting cloudcephosd1010 for kernel upgrade (T275753)
- 14:17 dcaro: [eqiad] rebooting cloudcephosd1009 for kernel upgrade (T275753)
- 13:54 dcaro: [eqiad] downtimed alert1001 Ceph OSDs down alert until 18:00 GMT+1 as that is not under the host being rebooted (T275753)
- 13:51 dcaro: [eqiad] rebooting cloudcephosd1008 for kernel upgrade (T275753)
- 13:45 dcaro: [eqiad] rebooting cloudcephosd1007 for kernel upgrade (T275753)
- 13:38 dcaro: [eqiad] rebooting cloudcephosd1006 for kernel upgrade (T275753)
- 12:07 dcaro: [eqiad] rebooting cloudcephosd1005 for kernel upgrade (T275753)
- 12:00 arturo: rebooting cloudcontrol1003 for kernel upgrade (T275753)
- 11:42 arturo: rebooting cloudcontrol1004 for kernel upgrade (T275753)
- 11:41 dcaro: [eqiad] rebooting cloudcephosd1004 for kernel upgrade (T275753)
- 11:32 dcaro: [eqiad] rebooting cloudcephosd1003 for kernel upgrade (T275753)
- 11:30 arturo: rebooting cloudcontrol1005 for kernel upgrade (T2
- 11:26 dcaro: [eqiad] rebooting cloudcephosd1002 for kernel upgrade (T275753)
- 11:16 dcaro: [eqiad] rebooting cloudcephosd1001 for kernel upgrade (T275753)
- 11:11 dcaro: [eqiad] rebooting cloudcephmon1003 for kernel upgrade (T275753)
- 11:05 dcaro: [eqiad] rebooting cloudcephmon1002 for kernel upgrade (T275753)
- 10:59 dcaro: [eqiad] rebooting cloudcephmon1001 for kernel upgrade (T275753)
- 10:45 arturo: rebooting cloudvirt1039 into a new kernel (T275753) --- spare
- 10:43 dcaro: [codfw1dev] rebooting cloudcephmon2003-dev for kernel upgrade (T275753)
- 10:38 dcaro: [codfw1dev] rebooting cloudcephmon2002-dev for kernel upgrade (T275753)
- 10:29 dcaro: [codfw1dev] rebooting cloudcephmon2001-dev for kernel upgrade (T275753)
- 10:24 arturo: [codfw1dev] purge old kernel packages on cloudvirt2003-dev to force boot into a new kernel (T275753)
- 10:11 arturo: [codfw1dev] manually creating /boot/grub/ on cloudvirt2003-dev to allow update-grub2 to run (so it can reboot into a new kernel) (T275753)
- 10:11 dcaro: [codfw1dev] rebooting cloudcephosd2003-dev for kernel upgrade (T275753)
- 10:05 dcaro: [codfw1dev] rebooting cloudcephosd2002-dev for kernel upgrade (T275753)
- 10:01 arturo: [codfw1dev] rebooting cloudvirt200X-dev for kernel upgrade (T275753)
- 09:59 arturo: [codfw1dev] rebooting cloudweb2001-dev for kernel upgrade (T275753)
- 09:53 arturo: [codfw1dev] rebooting cloudservices2003-dev for kernel upgrade (T275753)
- 09:51 arturo: [codfw1dev] rebooting cloudservices2002-dev for kernel upgrade (T275753)
- 09:45 arturo: [codfw1dev] rebooting cloudcontrol2004-dev for kernel upgrade (T275753)
- 09:44 arturo: [codfw1dev] rebooting cloudbackup[2001-2002].codfw.wmnet for kernel upgrade (T275753)
- 09:43 dcaro: [codfw1dev] rebooting cloudcephosd2001-dev for kernel upgrade (T275753)
- 09:41 arturo: [codfw1dev] rebooting cloudcontrol2003-dev for kernel upgrade (T275753)
- 09:33 arturo: [codfw1dev] rebooting cloudcontrol2001-dev for kernel upgrade (T275753)
2021-02-25
- 14:56 arturo: deployed wmcs-netns-events daemon to all cloudnet servers (T275483)
2021-02-24
- 11:07 arturo: force-reboot cloudmetrics1002, add icinga downtime for 2 hours. Investigating some server issue
- 00:17 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main stretch image in glance on eqiad1 T275430
2021-02-23
- 22:43 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main buster image in glance on eqiad1 T275430
- 20:36 andrewbogott: adding r/o access to the eqiad1-glance-images ceph pool for the client.eqiad1-compute for T275430
- 10:49 arturo: rebooting clounet1004 into new kernel from buster-bpo (T271058)
- 10:49 arturo: installing linux-image-amd64 from buster-bpo 5.10.13-1~bpo10+1 in cloudnet1004 (T271058)
2021-02-22
- 17:15 bstorm: restarting nova-compute on cloudvirt1016 and cloudvirt1036 in case it helps T275411
- 15:02 dcaro: Re-uploaded the debian buster 10.0 image from rbd to glance, that worked, re-spawning all the broken instances (T275378)
- 11:12 dcaro: Refreshing all the canary instances (T275354)
2021-02-18
- 14:50 arturo: rebooting cloudnet1004 for T271058
- 10:25 dcaro: Rebooting cloudmetrics1001 to apply new kernel (T275116)
- 10:16 dcaro: Rebooting cloudmetrics1002 to apply new kernel (T275116)
- 10:14 dcaro: Upgrading grafana on cloudmetrics1002 (T275116)
- 10:12 dcaro: Upgrading grafana on cloudmetrics1001 (T275116)
2021-02-17
- 15:58 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/664845 to cloudnet servers (T268335)
2021-02-15
- 16:25 arturo: [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers (T272963)
- 15:45 arturo: [codfw1dev] drop subnet definition for cloud-instances-transport1-b-codfw (T272963)
- 15:45 arturo: [codfw1dev] connect virtual router cloudinstances2b-gw to vlan cloud-gw-transport-codfw (185.15.57.10) (T272963)
2021-02-11
- 12:01 arturo: [codfw1dev] drop instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` (was buster, cannot use it yet)
- 11:59 arturo: [codfw1dev] create instance `tools-codfw1dev-bastion-2` (stretch) in `tools-codfw1dev` to test stuff related to T272397
- 11:45 arturo: [codfw1dev] create instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` to test stuff related to T272397
- 11:42 arturo: [codfw1dev] drop `tools` project, create `tools-codfw1dev`
- 11:38 arturo: [codfw1dev] drop `coudinfra` project (we are using `cloudinfra-codfw1dev` there)
- 05:37 bstorm: downtimed cloudnet1004 for another week T271058
2021-02-09
- 15:23 arturo: icinga-downtime for 2h everything *labs *cloud for openstack upgrades
- 11:14 dcaro: Merged the osd scheduler change for all osds, applying on all cloudcephosd* (T273791)
2021-02-08
- 18:50 bstorm: enabled puppet on cloudvirt1023 for now T274144
- 18:44 bstorm: restarted the backup_vms.service on cloudvirt1027 T274144
- 17:51 bstorm: deleted project pki T273175
2021-02-05
- 10:59 arturo: icinga-downtime labstore1004 tools share space check for 1 week (T272247)
- 10:21 dcaro: This was affecting maps and several others, maps and project-proxy have been fixed (T273956)
- 09:19 dcaro: Some certs around the infra are expired (T273956)
2021-02-04
- 10:12 dcaro: Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) (T273851)
2021-02-03
- 09:59 dcaro: Doing a full vm backup on cloudvirt1024 with the new script (T260692)
- 01:50 bstorm: icinga-downtime cloudnet1004 for a week T271058
2021-02-02
- 17:14 dcaro: Changed osd memory limit from 4G to 8G (T273649)
- 11:00 arturo: icinga-downtime cloudvirt-wdqs1001 for 1 week (T273579)
- 03:12 andrewbogott: running /usr/local/sbin/wmcs-purge-backups and /usr/local/sbin/wmcs-backup-instances on cloudvirt1024 to see why the backup job paged
2021-01-29
- 15:36 andrewbogott: disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api
2021-01-28
- 19:44 andrewbogott: shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train
2021-01-27
- 00:50 bstorm: icinga-downtime cloudnet1004 for a week T271058
2021-01-22
- 16:44 andrewbogott: upgrading designate on cloudvirt1003/1004 to OpenStack 'train'
- 11:29 dcaro: Doing some tests removed cloudcontrol1003 puppet cert, regenerating...
2021-01-21
- 11:35 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 (T209082)
- 11:30 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657358 (T272486, T209082)
2021-01-20
- 10:49 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 (T209082)
- 10:05 dcaro: Everything looks ok, created a new vm with a volume in ceph without issues, and on warnings/errors on ceph status, closing (T272303)
- 09:55 dcaro: Eqiad ceph cluster uprgaded, doing sanity checks (T272303)
- 09:46 dcaro: 75% of the eqiad cluster upgraded... continuing (T272303)
- 09:37 dcaro: 25% of the eqiad cluster upgraded... continuing (T272303)
- 09:24 dcaro: Mgr daemons upgraded and running, upgrading osd daemons on servers cloudcephosd1*, this make take a bit longer (T272303)
- 09:22 dcaro: Mon daemons upgraded and running, upgrading mgr daemons on servers cloudcephmon1* (T272303)
- 09:16 dcaro: Starting eqiad ceph upgrade, upgrading the mon servers cloudcephmon1* (T272303)
- 09:01 dcaro: Will start the ceph upgrade in 15 min, no downtime nor performance impact is expected (T272303)
2021-01-19
- 10:17 arturo: icinga-downtime cloudnet1004 for 1 week (T271058)
2021-01-18
- 16:00 dcaro: Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303)
- 15:38 dcaro: Upgraded mgr sevices on codfw ceph cluster, starting with osd ones (T272303)
- 15:35 dcaro: Upgraded mon sevices on codfw ceph cluster, starting with mgr ones (T272303)
- 15:21 dcaro: Starting upgrade of ceph mon nodes on codfw (T272303)
- 15:06 dcaro: re-enabling puppet on cloudcephosd2* hosts
- 13:53 dcaro: disabling puppet on cloudcephosd2* to resume perf tests
- 10:50 dcaro: re-enabling puppet on cephcloudosd2* (codfw)
- 10:07 dcaro: disabling puppet on cephcloudosd2* (codfw) to do some performance tests
- 09:00 dcaro: Enabling custom application 'cinder' on pool codfw1dev-cinder to get rid of health warnings
2021-01-17
- 16:53 arturo: icinga downtime labstore1004 /srv/tools space check for 3 days (T272247)
2021-01-15
- 13:41 arturo: icinga downtime labstore1004 maintain-dbuser alert until 2021-01-19 (T272125)
- 09:47 arturo: labstore1004 maintain-dbusers affected by T272127 and T272125
- 09:22 arturo: restart maintain-dbusers.service in labstore1004
- 08:19 dcaro: Merging the patch to disable write caches on ceph osds (T271527)
2021-01-13
- 17:03 arturo: remove cloudvirt1013 cloudvirt1032 cloudvirt1037 to the 'toobusy' host aggregate to prevent further CPU oversubscribing
- 12:40 arturo: try increasing systemd watchdog timeout for conntrackd in cloudnet1004 (T268335)
- 11:45 dcaro: https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 merged and deployed (and tested) (T268877)
- 11:40 dcaro: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 that might affect the encapi service (puppet on cloud environment), no downtime expected though (T268877)
- 10:56 arturo: trying to cleanup dpkg package mess in cloudnet2002-dev
- 10:02 arturo: prevent floating IP allocation from neutron transport subnet: root@cloudcontrol1005:~# neutron subnet-update --allocation-pool start=185.15.56.244,end=185.15.56.244 cloud-instances-transport1-b-eqiad1 (T271867)
2021-01-12
- 10:33 arturo: reboot cloudnet1004
- 10:32 arturo: update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1004 (T271058)
2021-01-11
- 10:22 arturo: doubling size of conntrack table in cloudnet servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/655407 (T271058)
- 10:07 arturo: manually cleanup conntrack table in cloudnet1004 (T271058)
- 09:19 dcaro: cleaned up ~1800 snapshots, 109 remaining only, one for each host x image combination (plus some ephemeral ones while doing backups), closing the task (T270478)
- 08:39 dcaro: cleaning up dangling snapshots now that we have the new suffixed ones (T270478)
2021-01-10
- 16:02 andrewbogott: restarting rabbitmq-server on all eqiad1 cloudcontrols
- 15:54 andrewbogott: restating neutron-metadata-agent on cloudnet1004 due to many syslog complaints
2021-01-08
- 11:25 arturo: rebooting both cloudnet2002-dev/cloudnet2003-dev to make sure interfaces are set up correctl (T271517)
- 11:22 arturo: connecting cloudnet2002-dev cloudnet2003-dev back to vlan 2120 (T271517)
- 11:06 arturo: root@cloudcontrol2001-dev:~# openstack router set --external-gateway wan-transport-codfw --fixed-ip subnet=cloud-instances-transport1-b-codfw,ip-address=208.80.153.190 cloudinstances2b-gw (T271517)
- 11:02 arturo: root@cloudcontrol2001-dev:~# openstack router set --enable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw (T271517)
- 11:01 arturo: enabling neutron hacks in codfw1dev (cloudnet2002-dev, cloudnet2003-dev) (T271517)
- 10:55 arturo: aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2107 (T271517)
- 10:55 arturo: aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2120 (T271517)
- 10:53 arturo: root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 208.80.153.185 --ip-version 4 --network wan-transport-codfw --no-dhcp --subnet-range 208.80.153.184/29 cloud-instances-transport1-b-codfw (T271517)
- 10:40 dcaro: Finished tests, brining osd online (od.48) for eqiad ceph cluster (T271417)
- 09:59 dcaro: Started performance tests on sdc (od.48) for eqiad ceph cluster (T271417)
- 09:41 dcaro: Taking osd.48 from eqiad ceph cluster out to do performance tests (T271417)
2021-01-07
- 15:19 dcaro: Finished speed tests on cloudcephosd2001-dev, reprovisioning the osd.0 sdc (T271417)
- 14:39 dcaro: Starting speed tests on cloudcephosd2001-dev sdc (T271417)
- 12:54 dcaro: Taking osd.0 down on codfw ceph cluster to try the disk performance testing process (T271417)
- 11:35 arturo: merging dmz_cidr change (T209082, T267779)
2021-01-05
- 10:40 dcaro: removing dumps-[1..*] backups from cloudvirt1024 as they are not needed (T271094)
2021-01-03
- 07:06 dcaro: Got a network hiccup on cloudnet1004, keeping track here T271058
2020-12-28
- 12:32 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 (T260692)
- 12:32 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 (T260682)
- 12:23 arturo: icinga downtime cloudvirt1026 disk space check until january 5 (T260692)
- 06:15 andrewbogott: restarting designate-central on cloudservices1003/1004. I'm pretty sure they're distressed because of DB lag but it's worth a try
2020-12-23
- 15:38 andrewbogott: restarting rabbitmq on cloudcontrol1004; suspected leaks
- 15:33 andrewbogott: restarting each cloudcontrol galera node in turn to see if that quiets down the syncing warnings
- 12:08 arturo: move memory out of the swap in cloudcontrol1004 by disabling/enabling it (1Gb swap was being used)
2020-12-22
- 15:30 dcaro: cleaning up 6778 dangling snapshots for glance images in eqiad (T270478)
- 13:51 dcaro: merged patch to move wikidumpparse backups to cloudvirt1025 to free space on cloudvirt1026
2020-12-19
- 16:18 dcaro: gzipped a bunch of logs on cloudvirt1004 due to / being out of space
- 00:14 bstorm: truncated /var/log/debug.1 on cloudcontrol1003 which appears to be the exact same content as the user.log files anyway
- 00:10 bstorm: truncated /var/log/daemon.log.1 and the haproxy log
- 00:02 bstorm: truncated /var/log/messages.1 on cloudcontrol1003
2020-12-18
- 23:53 bstorm: truncated haproxy.log.1 on cloudcontrol1003
- 20:46 andrewbogott: setting pg and pgp number to 4096 for eqiad1-compute as joachim thinks 8192 might be too much T270305
- 17:09 dcaro: finished cleaning up the dangling snapshots from cloudvirt1026 (T270478)
- 17:08 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1026) (T270478)
- 17:06 dcaro: finished cleaning up the dangling snapshots from cloudvirt1025 (T270478)
- 17:05 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1025) (T270478)
- 17:00 dcaro: finished cleaning up the dangling snapshots from cloudvirt1021 (T270478)
- 16:58 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1021) (T270478)
- 16:56 dcaro: finished cleaning up the dangling snapshots from cloudvirt1022 (T270478)
- 16:55 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1022) (T270478)
- 16:54 dcaro: finished cleaning up the dangling snapshots from cloudvirt1023 (T270478)
- 16:51 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1023) (T270478)
- 16:47 dcaro: finished cleaning up the dangling snapshots from cloudvirt1024, freed ~12% of the capacity (T270478)
- 16:21 dcaro: removing dangling rbd snapshots (for backups on cloudvirt1024) (T270478)
- 16:13 andrewbogott: setting autoscale to 'off' for both ceph pools (eqiad1-compute and eqiad1-glance-images) because we like how things are set and the autoscaler does not
- 10:33 dcaro: purging rbd snapshots for image fc6fb78b-4515-4dcc-8254-591b9fe01762 (T270478)
2020-12-17
- 22:17 andrewbogott: correction to above, set the pg and pgp to 1024 for eqiad1-glance-images
- 22:16 andrewbogott: setting pgp number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) T270305 (same as pg)
- 22:14 andrewbogott: setting pg number to 8192 for eqiad1-compute (a 4x increase) and 2048 for eqiad1-glance-images (also a 4x increase) T270305
- 22:10 andrewbogott: setting autoscale to 'warn' for both ceph pools (eqiad1-compute and eqiad1-glance-images)
2020-12-16
- 09:31 dcaro: removing invalid backups from cloudvirt1024 (196 in total) (T269419)
2020-12-14
- 17:42 dcaro: The removal freed ~12GB (still 100% usage :S) (T269419)
- 17:36 dcaro: removing invalid backups that have a valid copy (T269419)
- 15:43 dcaro: Merging the tagging for vm backups (T267195)
- 09:45 arturo: icinga downtime cloudvirt1024 for 6 days (T269419)
2020-12-13
- 09:11 _dcaro: running backup purge script on cloudvirt1024 (T269419)
2020-12-10
- 23:36 bstorm: cleaned up the logs for haproxy on cloudcontrol1003 by deleting all the gzipped ones and truncating the .1 file
- 11:56 dcaro: Freed some space on cloudvirt1024 by running the purge script (T269419)
- 09:17 dcaro: removing leaked dns record discordwiki.eqiad.wmflabs (clinic duty)
2020-12-08
- 18:01 dcaro: Host cloudvirt1030 up and running (T216195)
- 15:59 dcaro: Re-imaging host cloudvirt1030 (T216195)
- 14:18 dcaro: Host online cloudvirt1029 (T216195)
- 14:13 dcaro: Host re-imaged, doing tests cloudvirt1029 (T216195)
- 12:14 dcaro: Re-imaging cloudvirt1029 (T216195)
2020-12-07
- 18:33 andrewbogott: putting cloudvirt1023 back into service T269467
- 15:55 andrewbogott: reimaging cloudvirt1028 for T216195
- 14:49 dcaro: Re-imaging cloudvirt1027 (T216195)
2020-12-05
- 00:35 andrewbogott: moving cloudvirt1023 back into maintenance because T269467 continues to puzzle
2020-12-04
- 22:33 andrewbogott: moving cloudvirt1023 back into the ceph aggregate; it doesn't need upgrades after all T269467
- 22:24 andrewbogott: moving cloudvirt1023 out of the ceph aggregate and into maintenance for T269467
- 21:06 andrewbogott: putting cloudvirt1025 and 1026 back into service because I'm pretty sure they're fixed. T269313
- 12:12 arturo: manually running `wmcs-purge-backups` again on cloudvirt1024 (T269419)
- 11:25 arturo: icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269419)
- 11:25 arturo: last log line referencing cloudvirt1024 is a mistake (T269313)
- 11:24 arturo: icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269313)
- 10:28 arturo: manually running `wmcs-purge-backups` on cloudvirt1024 (T269419)
- 10:23 arturo: setting expiration to 2020-12-03 to the oldest backy snapshot of every VM in cloudvirt1024 (T269419)
- 09:54 arturo: icinga downtime cloudvirt1025 for 6 days (T269313)
2020-12-03
- 23:21 andrewbogott: removing all osds on cloudcephosd1004 for rebuild, T268746
- 21:45 andrewbogott: removing all osds on cloudcephosd1005 for rebuild, T268746
- 19:51 andrewbogott: removing all osds on cloudcephosd1006 for rebuild, T268746
- 17:01 arturo: icinga downtime cloudvirt1025 for 48h to debug network issue T269313
- 16:56 arturo: rebooting cloudvirt1025 to debug network issue T269313
- 16:38 dcaro: Rimaging cloudvirt1026 (T216195)
- 13:24 andrewbogott: removing all osds on cloudcephosd1008 for rebuild, T268746
- 02:55 andrewbogott: removing all osds on cloudcephosd1009 for rebuild, T268746
2020-12-02
- 20:04 andrewbogott: removing all osds on cloudcephosd1010 for rebuild, T268746
- 17:25 arturo: [15:51] failovering neutron virtual router in eqiad1 (T268335)
- 15:36 arturo: conntrackd is now up and running in cloudnet1003/1004 nodes (T268335)
- 15:33 arturo: [codfw1dev] conntrackd is now up and running in cloudnet200x-dev nodes (T268335)
- 15:08 andrewbogott: removing all osds on cloudcephosd1012 for rebuild, T268746
- 12:41 arturo: disable puppet in all cloudnet servers to merge conntrackd change T268335
- 11:12 dcaro: Reset the properties for the flavor g2.cores8.ram16.disk1120 to correct quotes (T269172)
- 09:57 arturo: moved cloudvirts 1030, 1029, 1028, 1027, 1026, 1025 away from the 'standard' host aggregate to 'maintenance' (T269172)
2020-12-01
- 20:06 andrewbogott: removing all osds on cloudcephosd1014 for rebuild, T268746
- 12:04 arturo: restarting neutron l3 agents to pick up config change
- 11:48 arturo: merging change to dmz_dir, detail list of private address https://gerrit.wikimedia.org/r/c/operations/puppet/+/641977
2020-11-30
- 18:12 andrewbogott: removing all osds from cloudcephosd1015 in order to investigate T268746
2020-11-29
- 17:18 andrewbogott: cleaning up some logfiles in tools-sgecron-01 — drive is full
2020-11-26
- 22:58 andrewbogott: deleting /var/log/haproxy logs older than 7 days in cloudcontrol100x. We need log rotation here it seems.
- 15:53 dcaro: Created private flavor g2.cores8.ram16.disk1120 for wikidumpparse (T268190)
2020-11-25
- 19:35 bstorm: repairing ceph pg `instructing pg 6.91 on osd.117 to repair`
- 09:31 _dcaro: The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722)
- 08:54 _dcaro: Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722)
- 08:45 _dcaro: Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722)
- 08:45 _dcaro: Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44)
- 08:19 _dcaro: Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd)
- 08:16 _dcaro: After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service
- 08:16 _dcaro: After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart
2020-11-22
- 17:40 andrewbogott: apt-get upgrade on cloudservices1003/1004
- 17:32 andrewbogott: upgrading Designate on cloudservices1003/1004 to Stein
2020-11-20
- 12:44 arturo: [codfw1dev] install conntrackd in cloudnet2003-dev/cloudnet2002-dev to research l3 agent HA reliability
- 09:26 arturo: incinga downtime labstore1006 RAID checks for 10 days (T268281)
2020-11-17
- 19:21 andrewbogott: draining cloudvirt1012 to experiment with libvirt/cpu things
2020-11-15
- 11:21 arturo: icinga downtime cloudbackup2002 for 48h (T267865)
2020-11-10
- 16:38 arturo: icinga downtime toolschecker for 2h becasue toolsdb maintenance (T266587)
- 11:24 arturo: [codfw1dev] enable puppet in puppetmaster01.cloudinfra-codfw1dev (disabled for unspecified reasons)
2020-11-09
- 12:42 arturo: restarted neutron l3 agent in cloudnet1003 bc it still had the old default route (T265288)
- 12:41 arturo: `root@cloudcontrol1005:~# neutron subnet-delete dcbb0f98-5e9d-4a93-8dfc-4e3ec3c44dcc` (T265288)
- 12:41 arturo: `root@cloudcontrol1005:~# neutron router-gateway-set --fixed-ip subnet_id=7c6bcc12-212f-44c2-9954-5c55002ee371,ip_address=185.15.56.244 cloudinstances2b-gw wan-transport-eqiad` (T265288)
- 12:19 arturo: subnet 185.1.5.56.240/29 has id 7c6bcc12-212f-44c2-9954-5c55002ee371 in neutron (T265288)
- 12:19 arturo: `root@cloudcontrol1005:~# neutron subnet-create --gateway 185.15.56.241 --name cloud-instances-transport1-b-eqiad1 --ip-version 4 --disable-dhcp wan-transport-eqiad 185.15.56.240/29` (T265288)
- 12:15 arturo: icinga-downtime toolschecker for 2h (T265288)
2020-11-02
- 13:36 arturo: (typo: dcaro)
- 13:35 arturo: added dcar as projectadmin & user (T266068)
2020-10-29
- 16:57 bstorm: silenced deployment-prep project alerts for 60 days since the downtime expired
- 08:12 arturo: force-powercycling cloudcephosd1006
2020-10-25
- 16:20 andrewbogott: adding cloudvirt1038 to the 'ceph' aggregate and removing from the 'spare' aggregate. We need this space while waiting on network upgrades for empty cloudvirts (T216195)
2020-10-23
- 11:30 arturo: [codfw1dev] openstack --os-project-id cloudinfra-codfw1dev recordset create --type PTR --record nat.cloudgw.codfw1dev.wikimediacloud.org. --description "created by hand" 0-29.57.15.185.in-addr.arpa. 1.0-29.57.15.185.in-addr.arpa. (T261724)
- 10:09 arturo: [codf1dev] doing DNS changes for the cloudgw PoC, including designate and https://gerrit.wikimedia.org/r/c/operations/dns/+/635965 (T261724)
2020-10-22
- 10:46 arturo: [codfw1dev] rebooting cloudinfra-internal-puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud to try fixing some DNS weirdness
- 09:43 arturo: enabling puppet in cloucontrol1003 (message said "please re-enable after 2020-10-22 06:00UTC")
2020-10-21
- 14:36 andrewbogott: running apt-get update && apt-get install -y facter on all cloud-vps instances
- 10:31 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code (T261724)
- 08:56 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code (T261724)
2020-10-20
- 15:47 arturo: changing DNS recursor ACLs (https://gerrit.wikimedia.org/r/c/operations/puppet/+/635314) this can be reverted any time if it causes problems (T261724)
- 14:49 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) to test puppet code (T261724)
2020-10-19
- 01:41 andrewbogott: deleting all Precise base images
- 01:36 andrewbogott: deleting all unused Jessie base images
2020-10-18
- 23:26 andrewbogott: deleting all Trusty base images
- 21:50 andrewbogott: migrating all currently used ceph images to rbd
2020-10-16
- 09:29 arturo: [codfw1dev] still some DNS weirdness, investigating
- 09:25 arturo: [codfw1dev] hard-rebooting bastion-codfw1dev-02, seems in bad shape, doesn't even wake up in the virsh console
- 09:18 arturo: [codfw1dev] live-hacked cloudservices2002-dev /etc/powerdns/recursor.conf file to include cloud-codfw1dev-floating CIDR (185.15.57.0/29) while https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 is in review, so VMs with a floating IP can query the DNS recursor (T261724)
- 09:01 arturo: [codfw1dev] basic network connectivity seems stable after cleaning up everything related to address scopes (T261724)
2020-10-15
- 15:17 arturo: [codfw1dev] try cleaning up anything related to address scopes in the neutron database (T261724)
- 13:56 arturo: [codfw1dev] drop neutron l3 agent hacks in cloudnet2002/2003-dev (T261724)
2020-10-13
- 17:54 andrewbogott: rebuilding cloudvirt1021 for backy support
- 15:22 andrewbogott: draining cloudvirt1021 so I can rebuild it with backy support
- 14:19 andrewbogott: rebuilding cloudvirt1022 with backy support
- 14:03 andrewbogott: draining cloudvirt1022 so I can rebuild it with backy support
- 11:19 arturo: [codfw1dev] rebooting labtestvirt2003
2020-10-09
- 10:15 arturo: [codfwd1ev] root@cloudcontrol2001-dev:~# openstack router set --disable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw (T261724)
- 09:22 arturo: [codfwd1dev] rebooting cloudnet boxes for bridge and vlan changes (T261724)
- 09:12 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet delete 31214392-9ca5-4256-bff5-1e19a35661de (cloud-instances-transport1-b-codfw - 208.80.153.184/29) (T261724)
- 09:10 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack router set --external-gateway wan-transport-codfw --fixed-ip subnet=cloud-gw-transport-codfw,ip-address=185.15.57.10 cloudinstances2b-gw (T261724)
- 08:49 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.9 --no-dhcp --subnet-range 185.15.57.8/30 cloud-gw-transport-codfw (T261724)
- 08:47 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# openstack subnet delete a5ab5362-4ffb-4059-9ff7-391e22dcf3bc (T261724)
2020-10-08
- 16:17 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.8 --no-dhcp --subnet-range 185.15.57.8/31 cloud-gw-transport-codfw` (with a hack -- see task) (T263622)
- 16:03 arturo: [codfw1dev] briefly live-hacked python3-neutron source code in all 3 cloudcontrol2xxx-dev servers to workaround /31 network definition issue (T263622)
- 10:28 arturo: [codfw1dev] reimaging labtestvirt2003 (cloudgw) T261724
2020-10-06
- 21:30 andrewbogott: moved cloudvirt1013 out of the 'ceph' aggregate and into the 'maintenance' aggregate for T243414
- 21:29 andrewbogott: draining cloudvirt1013 for upgrade to 10G networking
- 14:45 arturo: icinga downtime every cloud* lab* host for 60 minutes for keystone maintenance
2020-10-05
- 17:40 bd808: `service uwsgi-labspuppetbackend restart` on cloud-puppetmaster-03 (T264649)
2020-10-02
- 11:05 arturo: [codfw1dev] restarting rabbitmq-server in all 3 control nodes, the l3 agent was misbehaving
- 09:16 arturo: [codfw1dev] trying the labtestvirt2003 (cloudgw) reimage again (T261724)
2020-10-01
- 16:06 arturo: rebooting cloudvirt1024 to validate changes to /etc/network/interfaces file
- 15:36 arturo: [codfw1dev] reimaging labtestvirt2003
2020-09-30
- 16:47 andrewbogott: rebooting cloudvir1032, 1033, 1034 for T262979
- 13:28 arturo: enable puppet, reboot and pool back cloudvirt1031
- 13:27 arturo: extend icinga downtimes for another 120 mins
- 13:15 arturo: `aborrero@cloudcontrol1003:~$ sudo nova-manage placement sync_aggregates` after reading a hint in nova-api.log
- 13:02 arturo: rebooting cloudvirt1016 and moving it to the ceph host aggregate
- 12:55 arturo: rebooting cloudvirt1014 and moving it to the ceph host aggregate
- 12:51 arturo: rebooting cloudvirt1013 and moving it to the ceph host aggregate
- 12:39 arturo: root@cloudcontrol1005:~# openstack aggregate add host maintenance cloudvirt1031
- 12:36 arturo: rebooted cloudnet1003 (active) a couple of minutes ago
- 12:36 arturo: move cloudvirt1012 and cloudvirt1039 to the ceph aggregate
- 11:49 arturo: rebooting cloudvirt1039
- 11:46 arturo: rebooting cloudvirt1012
- 11:40 arturo: rebooting cloudnet1004 (standby) to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 (T262979)
- 11:38 arturo: [codfw1dev] rebooting cloudnet2002-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167
- 11:36 arturo: [codfw1dev] rebooting cloudnet2003-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167
- 11:33 arturo: disabling puppet and downtiming every virt/net server in the fleet in preparation for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 (T262979)
- 09:32 arturo: rebooting cloudvirt1012 to investigate linuxbridge agent issues
2020-09-29
- 15:40 arturo: downgrade linux kernel from linux-image-4.19.0-11-amd64 to linux-image-4.19.0-10-amd64 on cloudvirt1012
- 14:47 arturo: rebooting cloudvirt1012, chasing config weirdness in the linuxbridge agent
- 14:05 andrewbogott: reimaging 1014 over and over in an attempt to get partman right
- 13:51 arturo: rebooting cloudvirt1012
2020-09-28
- 14:55 arturo: [jbond42] upgraded facter to v3 across the VM fleet
- 13:54 andrewbogott: moving cloudvirt1035 from aggregate 'spare' to 'ceph'. We're going to need all the capacity we can get while converting older cloudvirts to ceph
2020-09-24
- 15:47 arturo: stopping/restarting rabbitmq-server in all cloudcontrol servers
- 15:45 arturo: restarting rabbitmq-server in cloudcontrol103
- 15:15 arturo: restarting floating_ip_ptr_records_updater.service in all 3 cloudcontrol servers to reset state after a DNS failure
2020-09-18
- 10:16 arturo: cloudvirt1039 libvirtd service issues were fixed with a reboot
- 09:56 arturo: rebooting cloudvirt1039 (spare) to try to fix some weird libvirtd failure
- 09:50 arturo: enabling puppet in cloudvirts and effectively merging patches from T262979
- 08:59 arturo: disable puppet in all buster cloudvirts (cloudvirt[1024,1031-1039].eqiad.wmnet) to merge a patch for T263205 and T262979
- 08:50 arturo: installing iptables from buster-bpo in cloudvirt1036 (T263205 and T262979)
2020-09-15
- 20:32 andrewbogott: rebooting cloudvirt1038 to see if it resolves T262979
- 13:58 andrewbogott: draining cloudvirt1002 with wmcs-ceph-migrate
2020-09-14
- 14:21 andrewbogott: draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate
- 10:41 arturo: [codfw1dev] trying to get the bonding working for labtestvirt2003 (T261724)
- 09:47 arturo: installed qemu security update in eqiad1 cloudvirts (T262386)
- 09:43 arturo: [codfw1dev] installed qemu security update in codfw1dev cloudvirts (T262386)
2020-09-09
- 18:13 andrewbogott: restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms
- 18:01 andrewbogott: restarting ceph-mgr@cloudcephmon1003 in hopes that the slow ops reported are phantoms (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EOWNO3MDYRUZKAK6RMQBQ5WBPQNLHOPV/)
- 17:40 andrewbogott: giving ceph pg autoscale another chance: ceph osd pool set eqiad1-compute pg_autoscale_mode on
- 00:05 bd808: Running wmcs-novastats-dnsleaks (T262359)
2020-09-08
- 21:48 bd808: Renamed FQDN prefixes to wikimedia.cloud scheme in cloudinfra-db01's labspuppet db (T260614)
- 14:29 andrewbogott: restarting nova-compute on all cloudvirts (everyone is upset from the reset switch failure)
- 14:18 arturo: restarting nova-fullstack service in cloudcontrol1003
- 14:17 andrewbogott: stopping apache2 on labweb1001 to make sure the Horizon outage is total
2020-09-03
- 09:31 arturo: icinga downtime cloud* servers for 30 mins (T261866)
2020-09-02
- 08:46 arturo: [codfw1dev] reimaging spare server labtestvirt2003 as debian buster (T261724)
2020-09-01
- 18:18 andrewbogott: adding drives on cloudcephosd100[3-5] to ceph osd pool
- 13:40 andrewbogott: adding drives on cloudcephosd101[0-2] to ceph osd pool
- 13:35 andrewbogott: adding drives on cloudcephosd100[1-3] to ceph osd pool
- 11:27 arturo: [codfw1dev] rebooting again cloudnet2002-dev after some network tests, to reset initial state (T261724)
- 11:09 arturo: [codfw1dev] rebooting cloudnet2002-dev after some network tests, to reset initial state (T261724)
- 10:49 arturo: disable puppet in cloudnet servers to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/623569/
2020-08-31
- 23:26 bd808: Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock
- 11:20 arturo: [codfw1dev] livehacking https://gerrit.wikimedia.org/r/c/operations/puppet/+/615161 in the puppetmasters for tests before merging
2020-08-28
- 20:12 bd808: Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003
2020-08-26
- 17:12 bstorm: Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 T261336
2020-08-21
- 21:34 andrewbogott: restarting nova-compute on cloudvirt1033; it seems stuck
2020-08-19
- 14:21 andrewbogott: rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak
2020-08-06
- 21:02 andrewbogott: removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts
- 20:06 bstorm: manually stopped the RAID check on cloudcontrol1003 T259760
2020-08-04
- 18:54 bstorm: restarting mariadb on cloudcontrol1004 to setup parallel replication
2020-08-03
- 17:02 bstorm: increased db connection limit to 800 across galera cluster because we were clearly hovering at limit
2020-07-31
- 19:28 bd808: wmcs-novastats-dnsleaks --delete (lots of leaked fullstack-monitoring records to clean up)
2020-07-27
- 22:17 andrewbogott: ceph osd pool set compute pg_num 2048
- 22:14 andrewbogott: ceph osd pool set compute pg_autoscale_mode off
2020-07-24
- 19:15 andrewbogott: ceph mgr module enable pg_autoscaler
- 19:15 andrewbogott: ceph osd pool set compute pg_autoscale_mode on
2020-07-22
- 08:55 jbond42: [codfw1dev] upgrading hiera to version5
- 08:48 arturo: [codfw1dev] add jbond as user in the bastion-codfw1dev and cloudinfra-codfw1dev projects
- 08:45 arturo: [codfw1dev] enabled account creation in labtestwiki briefly for jbond42 to create an account
2020-07-16
- 10:48 arturo: merging change to neutron dmz_cidr https://gerrit.wikimedia.org/r/c/operations/puppet/+/613123 (T257534)
2020-07-15
- 23:15 bd808: Removed Merlijn van Deen from toollabs-trusted Gerrit group (T255697)
- 11:48 arturo: [codfw1dev] created DNS records (A and PTR) for bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org <-> 185.15.57.2
- 11:41 arturo: [codfw1dev] add myself as projectadmin to the `bastioninfra-codfw1dev` project
- 11:39 arturo: [codfw1dev] created DNS zone `bastioninfra-codfw1dev.codfw1dev.wmcloud.org.` in the cloudinfra-codfw1dev project and then transfer ownership to the bastioninfra-codfw1dev project
2020-07-14
- 15:19 arturo: briefly set root@cloudnet1003:~ # sysctl net.ipv4.conf.all.accept_local=1 (in neutron qrouter netns) (T257534)
- 10:43 arturo: icinga downtime cloudnet* hosts for 30 mins to introduce new check https://gerrit.wikimedia.org/r/c/operations/puppet/+/612390 (T257552)
- 04:01 andrewbogott: added a wildcard *.wmflabs.org domain pointing at the domain proxy in project-proxy
- 04:00 andrewbogott: shortened the ttl on .wmflabs.org. to 300
2020-07-13
- 16:17 arturo: icinga downtime cloudcontrol[1003-1005].wikimedia.org for 1h for galera database movements
2020-07-12
- 17:39 andrewbogott: switched eqiad1 keystone from m5 to cloudcontrol galera
2020-07-10
- 20:26 andrewbogott: disabling nova api to move database to galera
2020-07-09
- 11:23 arturo: [codfw1dev] rebooting cloudnet2003-dev again for testing sysct/puppet behavior (T257552)
- 11:11 arturo: [codfw1dev] rebooting cloudnet2003-dev for testing sysct/puppet behavior (T257552)
- 09:16 arturo: manually increasing sysctl value of net.nf_conntrack_max in cloudnet servers (T257552)
2020-07-06
- 15:16 arturo: installing 'aptitude' in all cloudvirts
2020-07-03
- 12:51 arturo: [codfw1dev] galera cluster should be up and running, openstack happy (T256283)
- 11:44 arturo: [codfw1dev] restoring glance database backup from bacula into cloudcontrol2001-dev (T256283)
- 11:39 arturo: [codfw1dev] stopped mysql database in the galera cluster T256283
- 11:36 arturo: [codfw1dev] dropped glance database in the galera cluster T256283
2020-07-02
- 15:41 arturo: `sudo wmcs-openstack --os-compute-api-version 2.55 flavor create --private --vcpus 8 --disk 300 --ram 16384 --property aggregate_instance_extra_specs:ceph=true --description "for packaging envoy" bigdisk-ceph` (T256983)
2020-06-29
- 14:24 arturo: starting rabbitmq-server in all 3 cloudcontrol servers
- 14:23 arturo: stopping rabbitmq-server in all 3 cloudcontrol servers
2020-06-18
- 20:38 andrewbogott: rebooting cloudservices2003-dev due to a mysterious 'host down' alert on a secondary ip
2020-06-16
- 15:38 arturo: created by hand neutron port 9c0a9a13-e409-49de-9ba3-bc8ec4801dbf `paws-haproxy-vip` (T295217)
2020-06-12
- 13:23 arturo: DNS zone `paws.wmcloud.org` transferred to the PAWS project (T195217)
- 13:20 arturo: created DNS zone `paws.wmcloud.org` (T195217)
2020-06-11
- 19:19 bstorm_: proceeding with failback to labstore1004 now that DRBD devices are consistent T224582
- 17:22 bstorm_: delaying failback labstore1004 for drive syncs T224582
- 17:17 bstorm_: failing NFS back to labstore1004 to complete the upgrade process T224582
- 16:15 bstorm_: failing over NFS for labstore1004 to labstore1005 T224582
2020-06-10
- 16:09 andrewbogott: deleting all old cloud-ns0.wikimedia.org and cloud-ns1.wikimedia.org ns records in designate database T254496
2020-06-09
- 15:25 arturo: icinga downtime everything cloud* lab* for 2h more (T253780)
- 14:09 andrewbogott: stopping puppet, all designate services and all pdns services on cloudservices1004 for T253780
- 14:01 arturo: icinga downtime everything cloud* lab* for 2h (T253780)
2020-06-05
- 15:08 andrewbogott: trying to re-enable puppet without losing cumin contact, as per https://phabricator.wikimedia.org/T254589
2020-06-04
- 14:24 andrewbogott: disabling puppet on all instances for /labs/private recovery
- 14:23 arturo: disabling puppet on all instances for /labs/private recovery
2020-05-28
- 23:02 bd808: `/usr/local/sbin/maintain-dbusers --debug harvest-replicas` (T253930)
- 13:36 andrewbogott: rebuilding cloudservices2002-dev with Buster
- 00:33 andrewbogott: shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for T253780
2020-05-27
- 23:29 andrewbogott: disabling the backup job on cloudbackup2001 (just like last week) so the backup doesn't start while Brooke is rebuilding labstore1004 tomorrow.
- 06:03 bd808: `systemctl start mariadb` on clouddb1001 following reboot (take 2)
- 05:58 bd808: `systemctl start mariadb` on clouddb1001 following reboot
- 05:53 bd808: Hard reboot of clouddb1001 via Horizon. Console unresponsive.
2020-05-25
- 16:35 arturo: [codfw1dev] created zone `0-29.57.15.185.in-addr.arpa.` (T247972)
2020-05-21
- 19:23 andrewbogott: disabling puppet on cloudbackup2001 to prevent the backup job from starting during maintenance
- 19:16 andrewbogott: systemctl disable block_sync-tools-project.service on cloudbackup2001.codfw.wmnet to avoid stepping on current upgrade
- 15:48 andrewbogott: re-imaging cloudnet1003 with Buster
2020-05-19
- 22:59 bd808: `apt-get install mariadb-client` on cloudcontrol1003
- 21:12 bd808: Migrating wcdo.wcdo.eqiad.wmflabs to cloudvirt1023 (T251065)
2020-05-18
- 21:37 andrewbogott: rebuilding cloudnet2003-dev with Buster
2020-05-15
- 22:10 bd808: Added reedy as projectadmin in cloudinfra project (T249774)
- 22:05 bd808: Added reedy as projectadmin in admin project (T249774)
- 18:44 bstorm_: rebooting cloudvirt-wdqs1003 T252831
- 15:47 bd808: Manually running wmcs-novastats-dnsleaks from cloudcontrol1003 (T252889)
2020-05-14
- 23:28 bstorm_: downtimed cloudvirt1004/6 and cloudvirt-wdqs1003 until tomorrow around this time T252831
- 22:21 bstorm_: upgrading qemu-system-x86 on cloudvirt1006 to backports version T252831
- 22:15 bstorm_: changing /etc/libvirt/qemu.conf and restarting libvirtd on cloudvirt1006 T252831
- 21:12 andrewbogott: rebuilding cloudvirt1003-wdqs as part of T252831
- 15:47 andrewbogott: moving cloudvirt1004 and cloudvirt1006 to the 'ceph' aggregate for T252784
- 15:02 andrewbogott: moving all of cloudvirt100[1-9] into the 'toobusy' host aggregate. These are slower, have spinning disks, and are due for replacement.
2020-05-12
- 20:33 andrewbogott: moving cloudvirt1023 to the 'standard' pool and out of the 'spare' pool
- 19:10 jeh: disable neutron-openvswitch-agent service on cloudvirt2001-dev.codfw T248881
- 19:09 jeh: Shutdown the unused eno2 network interface on cloudvirt2001-dev.codfw to clear up monitoring errors T248425
- 18:20 andrewbogott: moving cloudvirt1024 out of the 'maintenance' aggregate and into 'spare'
- 16:45 andrewbogott: restarting neutron-l3-agent on cloudnet1004 so it knows about all three cloudcontrols. Leaving cloudnet1003 since restarting it there will cause network interruptions
- 14:06 arturo: icinga downtime everything for 2h for Debian Buster migration in some cloud components
2020-05-09
- 16:53 andrewbogott: rebuilding cloudcontrol2001-dev and 2003-dev with buster for T252121
2020-05-08
- 19:02 bstorm_: moving tools-k8s-haproxy-2 from cloudvirt1021 to cloudvirt1017 to improve spread
2020-05-05
- 13:58 andrewbogott: rebuilding cloudcontrol2004-dev to test new puppet changes
2020-05-04
- 09:04 arturo: [codfw1dev] manually modify iptables ruleset to only allow SSH from WMF bastions on cloudservices2003-dev and cloudcontrol2004-dev (T251604)
2020-04-21
- 22:12 andrewbogott: moving cloudvirt1004 out of the 'standard' aggregate and into the 'maintenance' aggregate
- 16:01 jeh: restart cloudceph mon and osd services for openssl upgrades
2020-04-15
- 18:44 jeh: create indexes and views for grwikimedia T245912
2020-04-13
- 15:07 jeh: restart memcached on labwebs to increase cache size T145703
2020-04-09
- 19:57 andrewbogott: upgrading eqiad1 designate to rocky
- 16:52 andrewbogott: cleaned up a bunch of leaked .eqiad.wmflabs dns records
2020-04-08
- 19:20 andrewbogott: rotated password and api token for pdns servers on cloudservices1003 and cloudservices1004
- 14:54 arturo: `root@cloudcontrol1003:~# cp /etc/inputrc .inputrc` to solve some bash shortcut weirdness
2020-04-07
- 20:57 andrewbogott: service sssd stop; rm -rf /var/lib/sss/db*; service sssd start on tools-sgebastion-08
2020-04-06
- 22:39 andrewbogott: deleting bogus groups cn=b'project-bastion',ou=groups,dc=wikimedia,dc=org and cn=b'project-tools',ou=groups,dc=wikimedia,dc=org from ldap
- 17:42 arturo: [codfw1dev] transferred DNS zone 57.15.185.in-addr.arpa. to the cloudinfra-codfw1dev project (T247972)
- 17:39 arturo: [codfw1dev] `openstack zone create --email root@wmflabs.org --type PRIMARY --ttl 3600 --description "floating IPs subnet" 57.15.185.in-addr.arpa.` (T247972)
- 16:23 arturo: restarting apache2 in cloudcontrol1003/1004 to pick up latest wmfkeystonehooks changes T249494
2020-04-02
- 20:59 jeh: codfw1dev clear VM error states and start bastions, puppet master and database
2020-04-01
- 16:27 arturo: [codfw1dev] enable puppet across the fleet clean vxlan changes (T248881)
2020-03-31
- 12:35 arturo: [codfw1dev] restarting VMs: designaterockytest14, bastion-codfw1dev-0[1,2] (T248881)
- 12:34 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudvirt2001-dev (T248881)
- 12:25 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudnet200[2,3]-dev (T248881)
- 11:45 arturo: [codfw1dev] rebooting cloudvirt2003-dev to pick up latest kernel update. Otherwise modprobe is confused trying to load modules and openvswitch won't start (T248881)
- 10:40 arturo: [codfw1dev] installing neutron-openvswitch-agent on cloudvirt2003-dev (T248881)
- 10:09 arturo: [codfw1dev] reboot cloudnet2003-dev into linux 4.9 (was using 4.14 from a testing operation in 2020-03-10)
2020-03-30
- 23:42 bstorm_: deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards T246689
- 16:44 arturo: [codfw1dev] installing package neutron-openvswitch-agent in cloudvirt2002-dev (T248881)
- 16:42 andrewbogott: restarting l3 agents on cloudnets in codfw1dev after applying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/584188/
2020-03-27
- 21:28 bd808: Created huggle.wmcloud.org Designate zone and allocated it to the huggle project
- 19:51 jeh: start haproxy on cloudcontrol2003-dev.wikimedia.org
2020-03-26
- 15:01 arturo: icinga downtime cloudvirt* cloudcontrol* cloudnet* lab* cloudstore*
- 15:01 andrewbogott: beginning openstack upgrade window for T242766
- 12:32 arturo: [codfw1dev] downgraded systemd, libsystemd0, udev and friends to the non-backports versions (T247013)
2020-03-25
- 19:29 andrewbogott: dumping a bunch of VMs on cloudvirt1015 to see if it still crashes
- 17:56 jeh: add labweb1002 back into the pool - completed horizon testing T240852
- 17:09 jeh: depool labweb1002 for horizon testing T240852
2020-03-24
- 19:41 jeh: switch cloudvirt1016 from maintenance to standard host aggregate T243327
- 15:31 andrewbogott: restarting nova-conductor and nova-api on cloudcontrol1003 and cloudcontrol1004
2020-03-23
- 21:41 jeh: restart neutron-l3-agent on cloudnet100[3,4] to pickup policy.yaml changes
- 13:28 jeh: disable puppet on labweb100[1,2] to enable horizon event traces T240852
- 10:26 arturo: restarting apache in both labweb1001/labweb1002 upon reports of returning 500s
2020-03-21
- 14:23 andrewbogott: restarting apache2 on labweb1001 and 1002
2020-03-18
- 19:17 andrewbogott: deleted a bunch of records from the pdns database on cloudservices1003/1004 which had a record name but the content (where an IP address should be) was NULL, e.g. m.wikidata.beta.wmflabs.org.
- 10:55 arturo: [codfw1dev] deleting BGP agent, undoing changes we did for T245606
2020-03-14
- 17:40 jeh: restart maintain-dbusers on labstore1004 T247654
2020-03-13
- 12:39 arturo: [codfw1dev] reintroduce address scopes for another round of testing T244851
- 12:17 arturo: [codfw1dev] enabling puppet in cloudnet200x-dev servers after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/579259 (T247505)
2020-03-12
- 22:29 bstorm_: running puppet across all dumps mounts to make sure active links are shifted to labstore1006
2020-03-11
- 18:38 jeh: set icingia downtime until 2020-03-23 on CODFW cloud[control,net,virt] hosts during openstack upgrades
- 12:50 arturo: [codfw1dev] several tests creating/deleting address scopes (T244727 T247135 T246887 T245606)
- 12:46 arturo: [codfw1dev] disable routing_source_ip in l3 agents for testing proposal detailed at https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Network_refresh#Eliminate_routing_source_ip_address (T244727)
2020-03-10
- 17:02 arturo: [codfw1dev] deleting address scopes, bad interaction with our custom NAT setup T247135
- 13:55 arturo: [codfw1dev] rebooting cloudnet2003-dev into linux kernel 4.14 for testing stuff related to T247135
2020-03-09
- 18:09 arturo: enabling puppet in cloudvirt1006, all services have been restored
- 17:59 arturo: deleted the neutron bridge on cloudvirt1006, for testing stuff related to the queens upgrade
- 17:58 arturo: stopped neutron-linuxbridge-agent and nova-compute in cloudvirt1006 for testing stuff related to the queens upgrade
2020-03-06
- 14:54 andrewbogott: draining all instances off of cloudvirt1006 for T246908
2020-03-05
- 14:24 arturo: [codfw1dev] we just enabled BGP session between cloudnet2xxx-dev and cr1-codfw (T245606)
- 13:07 arturo: [codfw1dev] move the extra IP address for BGP in cloudnet200x-dev servers from eno2.2120 to the br-external bridge device (T245606)
- 13:06 arturo: [codfw1dev] upgrade neutron-dynamic-routing packages in cloudnet200X-dev and cloudcontrol200X-dev servers to 11.0.0-2~bpo9+1 (T245606)
2020-03-04
- 22:22 andrewbogott: upgrading designate on cloudservices1003/1004 to Queens
- 22:09 andrewbogott: moving cloudvirt1006 into the maintenance aggregate for T246908
- 21:37 bd808: Running wmcs-wikireplica-dns to add service names for ngwikimedia.*.db.svc.eqiad.wmflabs (T240772)
- 21:14 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1009 (T246056)
- 21:11 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1010 (T246056)
- 21:08 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1011 (T246056)
- 21:05 bd808: Running `sudo maintain-meta_p --all-databases --purge` on labsdb1002 (T246056)
2020-03-02
- 16:54 arturo: [codfw1dev] deleted python3-os-ken debian package in cloudnet2003-dev which was installed by hand and had depedency issues
2020-02-29
- 16:32 bstorm_: downtimed the smart alert on cloudvirt1009 until Monday since apparently predictive failures flap T244986
2020-02-26
- 22:03 jeh: powering down cloudvirt1014 for hardware maintenance
2020-02-25
- 16:08 andrewbogott: changing neutron's rabbitmq password because oslo is having trouble parsing some of the characters in the password
- 15:26 andrewbogott: updated the cell_mapping record in the nova_api database to add the second rabbitmq server to the transport_url field
- 15:26 andrewbogott: updated the cell_mapping record in the nova_api database to set the db uri to 'mysql+pymysql' -- this in response to a deprecation notice
2020-02-24
- 12:16 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr2-codfw` (T245606)
- 12:16 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr1-codfw` (T245606)
- 12:09 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.187 --remote-as 65002 cr2-codfw` (T245606)
- 12:09 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.186 --remote-as 65002 cr1-codfw` (T245606)
- 12:06 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-delete 17b8c2a3-f0ce-4d50-a265-18ccac703c61` (T245606)
- 10:59 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker bgppeer` (T245606)
- 10:56 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.185 --remote-as 65002 bgppeer` (T245606)
2020-02-21
- 12:48 arturo: [codfw1dev] running `root@cloudcontrol2001-dev:~# neutron bgp-speaker-network-add bgpspeaker wan-transport-codfw` (T245606)
- 12:46 arturo: [codfw1dev] created bgpspeaker for AS64711 (T245606)
- 12:42 arturo: [codfw1dev] run `sudo neutron-db-manage upgrade head` to upgrade the db schema for neutron bgp tables
- 11:51 arturo: [codfw1dev] create a neutron subnet pool per each subnet objects we have and manually update DB to inter-associate them (T245606)
- 11:49 arturo: [codfw1dev] rename neutron address scope `no-nat` to `bgp` (T245606)
- 11:37 arturo: [codfw1dev] cleanup unused neutron subnet pools from previous address scope tests (T244851)
2020-02-20
- 19:22 andrewbogott: updating designate pool config for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/572213/
- 15:33 andrewbogott: migrating all VMs on cloudvirt1014 to cloudvirt1022
- 13:35 arturo: [codfw1dev] disable puppet in cloudcontrol servers to hack neutron.conf for tests related to T245606
- 13:33 arturo: [codfw1dev] disable puppet in cloudnet servers to hack neutron.conf for tests related to T245606
2020-02-18
- 22:19 andrewbogott: transferred the tools.wmcloud.org. to the tools project
- 22:16 andrewbogott: moved wmcloud.org dns domain to the cloud-infra project
- 21:02 andrewbogott: adding .eqiad1.wikimedia.cloud records to all existing eqiad1 VMs, updating all eqiad1 internal pointer records to reference the new eqiad1.wikimedia.cloud fqdns.
- 09:44 arturo: deleted DNS zone wmcloud.org and try re-creating it
2020-02-14
- 10:35 arturo: running `root@cloudcontrol2001-dev:~# designate server-create --name ns1.openstack.codfw1dev.wikimediacloud.org.` (T243766)
- 10:32 arturo: running `root@cloudcontrol1004:~# designate server-create --name ns1.openstack.eqiad1.wikimediacloud.org.` (T243766)
- 10:32 arturo: running `root@cloudcontrol1004:~# designate server-create --name ns0.openstack.eqiad1.wikimediacloud.org.` (T243766)
2020-02-12
- 13:38 arturo: [codfw1dev] add reference to subnetpool to the instance subnet `MariaDB [neutron]> update subnets set subnetpool_id='d129650d-d4be-4fe1-b13e-6edb5565cb4a' where id = '7adfcebe-b3d0-4315-92fe-e8365cc80668';` (T244851)
2020-02-11
- 13:46 arturo: [codfw1dev] creating some neutron objects to investigate T244851 (subnets, subnet pools, address scopes, ...)
- 12:40 arturo: [codfw1dev] delete unknown address scope 'wmcs-v4-scope': `root@cloudcontrol2001-dev:~# openstack address scope delete 078cfd71-117b-4aac-9197-6ebbbb7dd3de` (T244851)
- 12:40 arturo: [codfw1dev] delete unknown subnet pool 'cloudinstancesb-v4-pool0': `root@cloudcontrol2001-dev:~# openstack subnet pool delete d23a9b88-5c3d-4a53-ab88-053233a75365` (T244851)
2020-02-07
- 18:11 jeh: shutdown cloudvirt1016 for hardware maintenance T241882
2020-02-06
- 14:44 jeh: update apt packages on cloudvirt1015 T220853
- 14:28 jeh: run hardware tests on cloudvirt1015 T220853
2020-01-28
- 17:24 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# designate server-create --name ns0.openstack.codfw1dev.wikimediacloud.org. (T243766)
- 10:18 arturo: [codfw1dev] created DNS record `bastion-codfw1dev-01.codfw1dev.wmcloud.org A 185.15.57.2` (T242976, T229441)
- 10:13 arturo: [codfw1dev] the zone `codfw1dev.wmcloud.org` belongs now to the `cloudinfra-codfw1dev` project (T242976)
- 10:11 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for public addresses" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wmcloud.org.` (T242976 and T243766)
- 09:53 arturo: restart apache2 in labweb1001/1002 because horizon errors
- 09:47 arturo: created DNS zone wmcloud.org in eqiad1, transfer it to the cloudinfra project (T242976) right now only use is to delegate codfw1dev.wmcloud.org subdomain to designate in the other deployment
2020-01-27
- 12:45 arturo: [codfw1dev] manually move the new domain to the `cloudinfra-codfw1dev` project clouddb2001-dev: `[designate]> update zones set tenant_id='cloudinfra-codfw1dev' where id = '4c75410017904858a5839de93c9e8b3d';` T243556
- 12:44 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for VMs" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wikimedia.cloud.` T243556
2020-01-24
- 15:10 jeh: remove icinga downtime for cloudvirt1013 T241313
- 12:52 arturo: repooling cloudvirt1013 after HW got fixed (T241313)
2020-01-21
- 17:43 bstorm_: remounting /mnt/nfs/dumps-labstore1007.wikimedia.org/ on all dumps-mounting projects
- 10:24 arturo: running `sudo systemctl restart apache2.service` in both labweb servers to try mitigating T240852
2020-01-15
- 16:59 bd808: Changed the config for cloud-announce mailing list so that lsit admins do not get bounce unsubscribe notices
2020-01-14
- 14:03 arturo: icinga downtime all cloudvirts for another 2h for fixing some icinga checks
- 12:04 arturo: icinga downtime toolchecker for 2 hours for openstack upgrades T241347
- 12:02 arturo: icinga downtime cloud* labs* hosts for 2 hours for openstack upgrades T241347
- 04:26 andrewbogott: upgrading designate on cloudservices1003/1004
2020-01-13
- 13:34 arturo: [¢odfw1dev] prevent neutron from allocating floating IPs from the wrong subnet by doing `neutron subnet-update --allocation-pool start=208.80.153.190,end=208.80.153.190 cloud-instances-transport1-b-codfw` (T242594)
2020-01-10
- 13:27 arturo: cloudvirt1009: virsh undefine i-000069b6. This is tools-elastic-01 which is running on cloudvirt1008 (so, leaked on cloudvirt1009)
2020-01-09
- 11:12 arturo: running `MariaDB [nova_eqiad1]> update quota_usages set in_use='0' where project_id='etytree';` (T242332)
- 11:11 arturo: running `MariaDB [nova_eqiad1]> select * from quota_usages where project_id = 'etytree';` (T242332)
- 10:32 arturo: ran `root@cloudcontrol1004:~# nova-manage project quota_usage_refresh --project etytree`
2020-01-08
- 10:53 arturo: icinga downtime all cloudvirts for 30 minutes to re-create all canary VMs"
2020-01-07
- 11:12 arturo: icinga-downtime everything cloud* for 30 minutes to merge nova scheduler changes
- 10:02 arturo: icinga downtime cloudvirt1009 for 30 minutes to re-create canary VM (T242078)
2020-01-06
- 13:45 andrewbogott: restarting nova-api and nova-conductor on cloudcontrol1003 and 1004
2020-01-04
- 16:34 arturo: icinga downtime cloudvirt1024 for 2 months because hardware errors (T241884)
2019-12-31
- 11:46 andrewbogott: I couldn't!
- 11:40 andrewbogott: restarting cloudservices2002-dev to see if I can reproduce an issue I saw earlier
2019-12-25
- 10:13 arturo: icinga downtime for 30 minutes the whole cloud* lab* fleet to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/560575 (will restart some openstack components)
2019-12-24
- 15:13 arturo: icinga downtime all the lab* fleet for nova password change for 1h
- 14:39 arturo: icinga downtime all the cloud* fleet for nova password change for 1h
2019-12-23
- 11:13 arturo: enable puppet in cloudcontrol1003/1004
- 10:40 arturo: disable puppet in cloudcontrol1003/1004 while doing changes related to python-ldap
2019-12-22
- 23:48 andrewbogott: restarting nova-conductor and nova-api on cloudcontrol1003 and 1004
- 09:45 arturo: cloudvirt1013 is back (did it alone) T241313
- 09:37 arturo: cloudvirt1013 is down for good. Apparently powered off. I can't even reach it via iLO
2019-12-20
- 12:43 arturo: icinga downtime cloudmetrics1001 for 128 hours
2019-12-18
- 12:55 arturo: [codfw1dev] created a new subnet neutron object to hold the new CIDR for floating IPs (cloud-codfw1dev-floating - 185.15.57.0/29) T239347
2019-12-17
- 07:21 andrewbogott: deploying horizon/train to labweb1001/1002
2019-12-12
- 06:11 arturo: schedule 4h downtime for labstores
- 05:57 arturo: schedule 4h downtime for cloudvirts and other openstack components due to upgrade ops
2019-12-02
- 06:28 andrewbogott: running nova-manage db sync on eqiad1
- 06:27 andrewbogott: running nova-manage cell_v2 map_cell0 on eqiad1
2019-11-21
- 16:07 jeh: created replica indexes and views for szywiki T237373
- 15:48 jeh: creating replica indexes and views for shywiktionary T238115
- 15:48 jeh: creating replica indexes and views for gcrwiki T238114
- 15:46 jeh: creating replica indexes and views for minwiktionary T238522
- 15:36 jeh: creating replica indexes and views for gewikimedia T236404
2019-11-18
- 19:27 andrewbogott: repooling labsdb1011
- 18:54 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1011 T238480
- 18:44 andrewbogott: depooling labsdb1011 and killing remaining user queries T238480
- 18:42 andrewbogott: repooled labsdb1009 and 1010 T238480
- 18:19 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1010 T238480
- 18:18 andrewbogott: depooling labsdb1010, killing remaining user queries
- 17:46 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1009 T238480
- 17:38 andrewbogott: depooling labsdb1009, killing remaining user queries
- 16:54 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1012 T237509
2019-11-15
- 20:04 andrewbogott: repool labdb1011 (T237509)
- 19:29 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1011
- 19:25 andrewbogott: depooling labsdb1011, killing remaining queries
- 19:25 andrewbogott: repooling labsdb1010
- 18:59 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1012
- 18:57 andrewbogott: running maintain-views --all-databases --replace-all —clean on labsdb1010
- 18:54 andrewbogott: depooling labsdb1010, killing remaining user queries
- 18:54 andrewbogott: depooled labsdb1009, ran maintain-views —clean —all-databases —replace-all, repooled
2019-11-11
- 13:10 arturo: cloudweb2001-dev: disable puppet and redirect stderr in the loadExitNodes.php cron script to prevent cronspam while we investigate the cause of the issue (T237971)
2019-11-05
- 11:59 arturo: icinga downtime for 1h cloudcontrol1004, cloudnet1003, cloudvirt1017/1020/1022 for PDU operations in the rack T227542
2019-11-04
- 21:55 andrewbogott: deleting a ton of wikitech hiera pages that were either no-ops or refer to nonexistent VMs or prefixes
2019-10-31
- 11:01 arturo: icinga-downtimed cloudvirt1030 and cloudservices1003 for 1h due to PDU upgrade operations T227543
2019-10-30
- 22:43 jeh: reboot cloud-bootstrapvz-stretch to resolve bad bootstrapvz build
2019-10-29
- 10:52 arturo: icinga downtime cloudvirt1001/1002/1024/1018/1012/1009/1015/1008 for 1h T227538
2019-10-25
- 10:45 arturo: icinga downtime toolschecker for 1 to upgrade clouddb1002 mariadb (toolsdb secondary) (T236384 , T236420)
2019-10-24
- 12:30 arturo: starting cloudvirt1019, PDU operations ended (T227540)
- 11:58 arturo: icinga downtime for 2h (T227540) cloudvirt1019
- 11:15 arturo: poweroff cloudvirt1019 during the PDU operations (T227540)
- 11:10 arturo: icinga downtime for 2h (T227540) toolschecker
- 10:58 arturo: icinga downtime for 1h (T227540) cloudvirt100[3-7], cloudvirt1019, cloudvirt1016, cloudvirt1021, cloudvirt1013, cloudnet1004
2019-10-23
- 09:23 arturo: cloudvirt1026 reboot ended OK
- 09:12 arturo: rebooting cloudvirt1026 for kernel upgrade
- 09:09 arturo: cloudvirt1025 reboot ended OK
- 09:00 arturo: rebooting cloudvirt1025 for kernel upgrade
- 08:51 arturo: icinga downtime cloudvirt1025/1026 for reboots
2019-10-18
- 16:01 arturo: created the `eqiad1.wikimedia.cloud` DNS zone (T235846)
- 14:27 andrewbogott: deleted a bunch of leaked VMS from earlier today from the admin-monitoring project. Fullstack leaks due to an api outage, maybe?
- 10:44 arturo: double max_message_size from 40KB to 80KB in the cloud-admin mailing list. A simple email with a couple of quotes can go over the 40KB limit.
2019-10-16
- 21:59 jeh: resync wiki replica tool and user accounts T235697
- 09:40 arturo: reboot of cloudvirt1030 went fine
- 09:28 arturo: reboot of cloudvirt1029 went fine
- 09:28 arturo: rebooting cloudvirt1030 for kernel updates
- 09:12 arturo: rebooting cloudvirt1029 for kernel updates
- 09:11 arturo: reboot of cloudvirt1028 went fine
- 09:00 arturo: rebooting cloudvirt1028 for kernel updates
- 08:56 arturo: icinga downtime cloudvirt[1028-1030].eqiad.wmnet for 1h for reboots
2019-10-15
- 13:30 jeh: creating indexes and views for banwiki T234770
2019-10-10
- 18:55 bd808: Created indexes and views for nqowiki (T230543)
- 11:59 arturo: network switch hardware is down affecting cloudvirt1025/1026 (T227536) VMs are supposed to be online but unreachable
2019-10-09
- 10:44 arturo: cloudvirt1013 rebooted well
- 10:32 arturo: cloudvirt1013 is rebooting
- 10:32 arturo: cloudvirt1012 rebooted just fine (very slow, 35 VMs)
- 10:21 arturo: cloudvirt1012 is rebooting
- 10:19 arturo: cloudvirt1009 rebooted just fine (very slow though)
- 10:07 arturo: cloudvirt1009 is rebooting
- 10:06 arturo: cloudvirt1008 rebooted just fine (very slow though)
- 09:58 arturo: cloudvirt1008 is rebooting
- 09:52 arturo: icinga downtime toolschecker, paws, etc for 2h, because cloudvirt reboots
2019-10-07
- 14:07 arturo: horizon is disabled for maintenance (T212302)
- 14:00 arturo: starting scheduled maintenance: upgrading eqiad1 from openstack mitaka to newton
2019-10-02
- 15:23 arturo: codfw1dev renaming net/subnet objects to a more modern naming scheme T233665
- 12:49 arturo: codfw1dev delete all floating ip allocations in the deployment for mangling the network config for testing T233665
- 12:47 arturo: codfw1dev deleting all VMs in the deployment for mangling the network config for testing T233665
- 11:08 arturo: codfw1dev rebooting cloudnet2002-dev and cloudnet2003-dev for testing T233665
- 10:31 arturo: codfw1dev: add cloudinstances2b-gw router to the l3 agent in cloudnet2003-dev
- 09:59 arturo: codfw1dev: cleanup leftover "HA port tenant admin" in neutron (ports from missing servers)
- 09:46 arturo: codfw1dev: cleanup leftover neutron agents
2019-09-30
- 10:21 arturo: we installed ferm in every VM by mistake. Deleting it and forcing a puppet agent run to try to go back to a clean state.
- 09:38 arturo: downtime toolschecker for 24h
- 09:33 arturo: force update ferm cloud-wide (in all VMs) for T153468
2019-08-18
- 10:39 arturo: rebooting cloudvirt1023 for new interface names configuration
- 10:34 arturo: downtimed cloudvirt1023 for 2 days
2019-08-05
- 17:17 bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests)
2019-07-29
- 20:14 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T194859)
2019-07-25
- 12:32 arturo: eqiad1/glance: debian-9.9-stretch image deprecates debian-9.8-stretch (T228983)
- 09:59 arturo: (codfw1dev) drop missing glance images (T228972)
- 09:32 arturo: (codfw1dev) deleting a bunch of VMs that were running in now missing hypervisors
- 09:31 arturo: (codfw1dev) deleting a bunch of VMs in ERROR and SHUTDOWN state
- 09:27 arturo: last log entry refers to the codfw1dev deployment
- 09:27 arturo: cleanup `nova service-list` from old hypervisors (labtest*)
- 09:23 arturo: refreshed nova DB grants in clouddb2001-dev for the codfw1dev deployment
- 08:47 arturo: cleanup the cloud-announce pending emails (spam)
2019-07-23
- 19:43 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
2019-07-22
- 23:44 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T228529)
2019-07-11
- 22:07 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1003
- 22:01 bd808: `sudo apt-get install python2.7-dbg` on cloudcontrol1003 to debug hung python process
- 21:48 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1004
2019-06-25
- 16:05 bstorm_: updated python3.4 to update4 wherever it was installed on Jessie VMs to prevent issues with broken update3.
- 14:56 bstorm_: Updated python 3.4 on the labs-puppetmaster server
2019-06-03
- 15:55 arturo: T221769 rebooting cloudservices1003 after bootstrapping is apparently completed
2019-05-28
- 21:42 bstorm_: unmounting labstore1003-scratch on all cloud clients
- 18:14 bstorm_: T209527 switched mounts from labstore1003 to cloudstore1008 for scratch
2019-05-20
- 17:25 arturo: T223923 dropped compat-network config from /etc/network/interfaces in eqiad1/codfw1dev neutron nodes
- 17:22 arturo: T223923 dropped br-compat bridges and vlan interfaces (1102 and 2102) in eqiad1/codfw1dev neutron nodes
- 17:07 arturo: T223923 dropped compat-network configuration from the neutron database in eqiad1
- 16:55 arturo: T223923 dropped compat-network configuration from the neutron database in codfw1dev
2019-05-15
- 17:00 andrewbogott: touching /root/firstboot_done on all VMs that cumin can reach. This will prevent firstboot.sh from running a second time if/when any of these are rebooted. T223370
2019-04-26
- 15:51 arturo: andrew updated dns servers for the cloud-instances2-b-eqiad subnet in neutron: 208.80.154.143 and 208.80.154.24
2019-04-25
- 11:14 arturo: T221760 increased size of conntrack table
2019-04-24
- 12:54 arturo: T220051 puppet broken in every VM in Cloud VPS, fixing right now
2019-04-22
- 11:14 arturo: create by hand /var/cache/labsaliaser/labs-ip-aliases.json in cloudservices2002-dev (T218575)
2019-04-16
- 22:55 bd808: cloudcontrol2003-dev: added `exit 0` to /etc/cron.hourly/keystone to stop cron spam on partially configured cluster
- 12:08 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
- 11:27 arturo: T219626 add DB grants for neutron and glnace to clouddb2001-dev (codfw1dev)
- 10:37 arturo: T219626 replace 208.80.153.75 with 208.80.153.59 in the clouddb2001-dev database (codfw1dev deployment)
- 10:30 arturo: T219626 replace labtestcontrol2003 with cloudcontrol2001-dev in the clouddb2001-dev database (codfw1dev deployment)
2019-04-15
- 13:08 arturo: T219626 add DB grants for keystone/nova/nova_api to clouddb2001-dev (codfw1dev)
2019-04-13
- 18:25 bd808: Restarted nova-compute service on cloudvirt1015 (T220853)
2019-04-11
- 12:00 arturo: T151704 deploying oidentd to cloudnet1xxx servers
2019-04-02
- 19:52 andrewbogott: installed new base Stretch image. Updated packages, and runs apt-get dist-upgrade on first boot.
2019-03-29
- 14:34 andrewbogott: moving tools-static.wmflabs.org to point to tools-static-13 in eqiad1-r
- 00:00 bstorm_: T193264 Added osm.db.svc.eqiad.wmflabs to cloud DNS
2019-03-25
- 00:40 bd808: Restarted maintain-dbusers on labstore1004. Process hung up on failed LDAP connection.
2019-03-21
- 19:32 andrewbogott: restarting keystone on cloudcontrol1003
2019-03-15
- 16:00 gtirloni: increased nscd cache size (T217280)
2019-03-14
- 19:04 gtirloni: bstorm started nfsd on labstore1006 (T218341)
- 16:42 gtirloni: published new debian-9.8 image (T218314)
2019-03-04
- 19:37 bstorm_: umounted /mnt/nfs/dumps-labstore1006.wikimedia.org across all VPS projects for T217473
2019-02-26
- 12:46 gtirloni: shutdown toolsbeta-sgegrid-master (cronspam)
2019-02-25
- 10:32 gtirloni: restarted nfsd on labstore1004
2019-02-21
- 09:09 gtirloni: restarted uwsgi-labspuppetbackend.service on labpuppetmaster1001
- 07:42 gtirloni: created project cloudstore
- 07:36 gtirloni: deleted wmcs-nfs project
2019-02-20
- 21:58 andrewbogott: silencing shinken and disabling puppet on shinken-02 for now
2019-02-19
- 12:00 gtirloni: added nagios@icinga2001.wikimedia.org to cloud-admin-feed@ allowed senders
2019-02-18
- 20:21 gtirloni: downtimed cloudvirt1020
- 20:12 gtirloni: ran `labs-ip-alias-dump.py` on cloudservices/labservices servers
2019-02-15
- 13:10 arturo: T216239 labvirt1019 has been drained
- 12:22 arturo: T216239 draining labvirt1009 with a command like this: `root@cloudcontrol1004:~# wmcs-cold-migrate --region eqiad --nova-db nova 2c0cf363-c7c3-42ad-94bd-e586f2492321 labvirt1001`
- 12:02 arturo: more nova service cleanups in the database (labvirts that were reallocated to eqiad1)
- 11:34 arturo: T216190 cleanup from nova database `nova service-delete 35`
- 03:50 andrewbogott: updated VPS base images for Jessie and Stretch, now featuring Stretch 9.7
2019-02-11
- 18:13 gtirloni: cleaned old metrics data in labmon1001 T215417
- 15:28 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1011
- 14:18 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1010
2019-02-08
- 14:56 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1009
2019-02-06
- 11:47 gtirloni: downtimed labmon100{1,2} T215399
- 00:17 bstorm_: T214106 deleted bstorm-test2 project to clean up
2019-02-05
- 10:48 arturo: labmon1001 is now part of the 'eqiad1-r' region
2019-02-01
- 09:54 arturo: moving canary1015-01 VM instance from cloudvirt1024 back to cloudvirt1015
2019-01-31
- 12:44 arturo: T215012 depooling cloudvirt1015 and migrating all VMs to cloudvirt1024
2019-01-25
2019-01-24
- 11:50 arturo: T213925 modify subnet cloud-instances-transport1-b-eqiad1 to avoid floating IP allocations from here
- 11:07 arturo: T214299 failover cloudnet1003 to cloudnet1004
- 10:03 arturo: T214299 reimage cloudnet1004 to debian stretch
- 09:51 arturo: T214299 failover cloudnet1004 to cloudnet1003
2019-01-22
- 19:19 arturo: T214299 stretch cloudnet1003 is apparently all set
- 18:40 arturo: T214299 manually delete from neutron agents from cloudnet1003 (must be added again after reimage, with new uuids)
- 18:37 arturo: T214299 reimaging cloudnet1003 as debian stretch
- 17:35 jbond42: starting roll out of apt package updates to
- 14:41 gtirloni: T214369 deployed new jessie and stretch VM images
2019-01-21
- 18:29 gtirloni: installed libguestfs-tools on cloudvirt1021
2019-01-16
- 14:21 andrewbogott: stopping old VPS proxies in eqiad — T213540
2019-01-15
- 14:20 andrewbogott: changing tools.wmflabs.org to point to tools-proxy-03 in eqiad1
2019-01-13
- 20:00 andrewbogott: VPS proxies are now running in eqiad1 on proxy-01. Old VMs will wait a bit for deletion. T213540
- 19:12 andrewbogott: moving the VPS proxy API backend to proxy-01.project-proxy.eqiad.wmflabs, as per T213540
- 17:11 andrewbogott: moving all VPS dynamic proxies to proxy-eqiad1.wmflabs.org aka proxy-01.project-proxy.eqiad.wmflabs, as per T213540
2019-01-09
- 22:21 bd808: neutron quota-update --tenant-id tools --port 256
2019-01-08
- 18:59 bd808: Definately did NOT delete uid=novaadmin,ou=people,dc=wikimedia,dc=org
- 18:59 bd808: Deleted LDAP user uid=neutron,ou=people,dc=wikimedia,dc=org
- 18:58 bd808: Deleted LDAP user uid=novaadmin,ou=people,dc=wikimedia,dc=org
2019-01-06
- 22:03 bd808: Set floatingip quota of 60 for tools project in eqiad1-r region (T212360)
2018-12-20
- 17:10 arturo: T207663 renumbered transport network in eqiad1
2018-12-05
- 17:59 arturo: T207663 changed labtestn transport network addressing from private to public
2018-12-03
- 13:25 arturo: T202886 create again PTR records after dnsleak.py fix
2018-11-30
- 14:08 arturo: running dns leaks cleanup `root@cloudcontrol1003:~# /root/novastats/dnsleaks.py --delete`
2018-11-28
- 17:33 gtirloni: deleted contintcloud project (T209644)
2018-11-27
- 13:32 gtirloni: enabled DRBD stats collection on labstore100[4-5] T208446
2018-11-22
- 07:12 gtirloni: deployed new debian-9.6-stretch image
2018-11-21
- 10:48 arturo: re-created compat-net as not shared in labtestn to test stuff related to T209954
2018-11-16
- 12:43 gtirloni: armed keyholder on labpuppetmaster1001/1002 after reboots
- 12:08 gtirloni: rebooted labpuppetmaster1001 (T207377)
- 11:57 gtirloni: rebooted labpuppetmaster1002 (T207377)
2018-11-14
- 17:19 gtirloni: added cloudvirt1016 to scheduler pool (T209426)
- 15:41 gtirloni: reimaging labvirt1016 as cloudvirt1016
- 15:14 gtirloni: reset-failed systemd unit nova-scheduler on cloudcontrol1004
- 13:52 gtirloni: rebooted labservices1002 after package upgrades (T207377)
- 13:23 gtirloni: rebooted labstore2004 after package upgrades (T207377)
- 13:20 gtirloni: rebooted labstore2003 after package upgrades (T207377)
- 13:20 gtirloni: rebooted labstore2001/labstore2003 after package upgrades (T207377)
- 12:08 gtirloni: rebooted labnet1002 after package upgrades
- 12:01 gtirloni: rebooted labmon1002 after package upgrades
- 11:41 gtirloni: rebooted labcontrol1002 after package upgrades
- 11:15 gtirloni: rebooted cloudcontrol1004 after package upgrades
2018-11-09
- 18:17 gtirloni: restarted neutron-linuxbridge-agent on cloudvirt1018/1023
2018-11-08
- 11:00 gtirloni: Added novaproxy-02 to $CACHES
- 10:50 gtirloni: Added cloudvirt1017 to eqiad1 region
2018-11-07
- 13:49 arturo: T208733 moving labvirt1017 from main deployment to eqiad1 and renaming it to cloudvirt1017
2018-10-22
- 16:24 arturo: T206261 another update to dmz_cidr in eqiad1
- 10:26 arturo: change again in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)
2018-10-19
- 12:02 arturo: revert change in dmz_cidr in eqiad1 for now (T206261)
- 11:16 arturo: change in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)
- 10:14 arturo: we have new virt servers in the eqiad1 deployment since past week and this week: cloudvirt1018, cloudvirt1023, cloudvirt1024
2018-09-26
- 10:40 arturo: T205524 all sorts of restarts in all neutron daemons
- 10:20 arturo: T205524 stop/start all neutron agents in cloudnet1003.eqiad.wmnet
- 10:13 arturo: T205524 restart all agents in cloudnet1004.eqiad.wmnet
- 10:10 arturo: restart neutron-server in cloudcontrol1003, investigating T205524
2018-09-24
- 10:57 arturo: try to increase floating ip allocation pool in eqiad1. Of 185.15.56.0/25 we are using only 185.15.56.10-185.15.56.31, I don't know why. Let's use 185.15.56.2-185.15.56.126
2018-09-21
- 17:18 bd808: Running `sudo maintain-meta_p --all-databases --purge` across labsdb10(09|10|11) for T201890
2018-09-17
- 22:08 bd808: Granted gtirloni project roles of admin, projectadmin, and user
2018-09-12
- 11:20 arturo: T202636 distributing default routes using classless-static-route for all VMs in main/labtest (dnsmasq/nova-network)
2018-09-11
- 16:52 arturo: again, restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
- 16:08 arturo: restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
- 10:53 arturo: T202636 creating all the compat-network configuration in neutron
- 10:36 arturo: T202636 creating br-compat bridge in eqiad1 for the compat network
- 10:33 arturo: T202636 manually reserve 10.68.23.253 (in nova-network)
2018-09-10
- 22:46 andrewbogott: deleting all VMs on labvirt1019 and 1020 as prep for T204003
2018-08-30
- 15:46 andrewbogott: restarting rabbitmq-server on cloudcontrol1003
- 13:07 arturo: T202636 internal network routing now exists in labtest/labtestn for VM to communicate with each other
2018-08-28
- 11:04 arturo: T202549 eqiad1 databases are all now running in m5-master. Mysql has been cleaned from cloudcontrol100[3,4]
2018-08-23
- 16:17 arturo: T188589 bstorm_ merged patch to reduce nova DB connection usage
- 13:15 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.4,end=10.64.22.4 e4fb2771-a361-4add-ac4e-280cc300c59f`
- 13:10 arturo: T202115 (was `{"start": "10.64.22.2", "end": "10.64.22.254"}` )
- 13:08 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.254,end=10.64.22.254 e4fb2771-a361-4add-ac4e-280cc300c59f`
2018-08-22
- 15:28 arturo: cleanup local glance,keystone databases in cloudcontrol1003.wikimedia.org (already in m5-master)
- 15:27 arturo: cleanup local keystone database in cloudcontrol1003.wikimedia.org (already in m5-master)
2018-08-21
- 15:39 andrewbogott: initial test message
- 10:31 arturo: eqiad1 remove leftover port for HA on labnet1004
- 10:15 arturo: test
2018-05-07
- 18:07 bstorm_: stopped the toolhistory job because it is totally broken and fills /tmp.
2018-02-09
- 00:55 bd808: Added Arturo Borrero Gonzalez and Bstorm as project members
- 00:54 bd808: Removed Yuvipanda at user request (T186289)