You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Portal:Cloud VPS/Admin/Procedures and operations
This page describes some standard admin procedures and operations for our Cloud VPS deployments, specifically for the jessie/mitaka/neutron combinations.
Manual routing failover
In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX|labnetXXXX).
Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:
Examples of neutron operations |
---|
The following content has been placed in a collapsed box for improved usability. |
root@cloudcontrol1003:~# neutron router-list
+--------------------------------------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+------+
| id | name | external_gateway_info | distributed | ha |
+--------------------------------------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+------+
| d93771ba-2711-4f88-804a-8df6fd03978a | cloudinstances2b-gw | {"network_id": "5c9ee953-3a19-4e84-be0f-069b5da75123", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "e4fb2771-a361-4add-ac4e-280cc300c59f", "ip_address": "10.64.22.4"}]} | False | True |
+--------------------------------------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+------+
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | standby |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | :-) | active |
+--------------------------------------+--------------+----------------+-------+----------+
user@cloudnet1004:~ $ sudo systemctl stop neutron-metadata-agent.service neutron-dhcp-agent.service neutron-l3-agent.service neutron-linuxbridge-agent.service
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | active |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | xxx | standby |
+--------------------------------------+--------------+----------------+-------+----------+
|
The above content has been placed in a collapsed box for improved usability. |
Alternatively you can play with other neutron commands to manage agents.
Examples of other neutron commands |
---|
The following content has been placed in a collapsed box for improved usability. |
root@cloudcontrol1003:~# neutron agent-update 970df1d1-505d-47a4-8d35-1b13c0dfe098 --admin-state-down
Updated agent: 970df1d1-505d-47a4-8d35-1b13c0dfe098
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | active |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | False | :-) | standby |
+--------------------------------------+--------------+----------------+-------+----------+
root@cloudcontrol1003:~# neutron agent-update 970df1d1-505d-47a4-8d35-1b13c0dfe098 --admin-state-up
Updated agent: 970df1d1-505d-47a4-8d35-1b13c0dfe098
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | active |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | :-) | standby |
+--------------------------------------+--------------+----------------+-------+----------+
root@cloudcontrol1003:~# neutron l3-agent-router-remove 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b d93771ba-2711-4f88-804a-8df6fd03978a
Removed router d93771ba-2711-4f88-804a-8df6fd03978a from L3 agent
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | :-) | standby |
+--------------------------------------+--------------+----------------+-------+----------+
root@cloudcontrol1003:~# neutron l3-agent-router-add 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b d93771ba-2711-4f88-804a-8df6fd03978a
Added router d93771ba-2711-4f88-804a-8df6fd03978a to L3 agent
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | active |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | :-) | standby |
+--------------------------------------+--------------+----------------+-------+----------+
|
The above content has been placed in a collapsed box for improved usability. |
By the time of this writing is not known which method produces less impact in terms of network downtime.
Remove hypervisor
Follow this procedure to remove a virtualizacion server (typically cloudvirtXXXX|labvirtXXXX).
- Remove or shutdown node
openstack hypervisor list
will still show itnova service-list
will show it as down once it's taken away:
| 9 | nova-compute | labtestvirt2003 | nova | disabled | down | 2017-12-18T20:52:59.000000 | AUTO: Connection to libvirt lost: 0 |
nova service-delete 9
will remove where the number is the id fromnova service-list
VM/Hypervisor pinning
In case you want to run a concrete VM in a concrete hypervisor, run this command at instance creation time:
OS_TENANT_NAME=myproject openstack server create --flavor 3 --image b7274e93-30f4-4567-88aa-46223c59107e --availability-zone host:cloudvirtXXXX myinstance
Canary VM instance in every hypervisor
Each hypervisor should have a canary VM instance running.
The command to create it should be something like:
openstack server create \ --os-project-id testlabs \ --image debian-10.0-buster \ --flavor 2 \ --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 \ --property description='canary VM' \ --availability-zone host:cloudvirt1022 \ canary1022-01
NOTE: you could also use a script like this: wmcs-canary-vm-refresh.sh (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).
Updating openstack database password
Openstack uses many databases, and updating the password require several stesp.
nova
We usually have the same password for the different nova dabatases nova_eqiad1 and nova_api_eqiad1.
- in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update the profile::openstack::eqiad1::nova::db_pass hiera key in hieradata/eqiad/profile/openstack/eqiad1/nova.yaml.
- in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update class passwords::openstack::nova in modules/passwords/manifests/init.pp.
- in the database (m5-master.eqiad.wmnet), update grants, something like:
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
- repeat grants for every cloudcontrol server IP and IPv6 address.
- update cell mapping database connection string (yes, inside the database itself) in m5-master.eqiad.wmnet:
$ mysql nova_api_eqiad1;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_eqiad1' where id=4;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_cell0_eqiad1' where id=1;
- run puppet everywhere (in cloudcontrol servers etc) so the new password is added to the config files.
- if puppet is not restarting the affected services, restart them by hand (systemctl restart nova-api, etc)
neutron
TODO: add information.
glance
TODO: add information.
designate
TODO: add information.
keystone
TODO: add information.