You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Cloud VPS/Admin/Procedures and operations: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>DSquirrelGM
imported>Arturo Borrero Gonzalez
(→‎Canary VM instance in every hypervisor: refresh flavor for canary VMs)
Line 1: Line 1:
This page describes some standard admin '''procedures and operations''' for our Cloud VPS deployments, specifically for the jessie/mitaka/neutron combinations.
This page describes some standard admin '''procedures and operations''' for our Cloud VPS deployments.


= Manual routing failover =
= Manual routing failover =


In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX|labnetXXXX).
In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX).


Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:
Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:
Line 97: Line 97:
= VM/Hypervisor pinning =
= VM/Hypervisor pinning =


In case you want to run a concrete VM in a concrete hypervisor, run this command at instance creation time:
In case you want to run a concrete VM in a concrete hypervisor, run the command at instance creation time with the '''--availability-zone''' option as in the following example:


<code>
<syntaxhighlight lang="shell-session">
OS_TENANT_NAME=myproject openstack server create --flavor 3 --image b7274e93-30f4-4567-88aa-46223c59107e --availability-zone host:cloudvirtXXXX myinstance
user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id testlabs --image debian-10.0-buster --flavor g2.cores1.ram2.disk20 --network lan-flat-cloudinstances2b --property description='test VM' --availability-zone host:cloudvirt1022 mytestvm
</code>
</syntaxhighlight>


= Canary VM instance in every hypervisor =
= Canary VM instance in every hypervisor =
Line 109: Line 109:
The command to create it should be something like:
The command to create it should be something like:


openstack server create \
<syntaxhighlight lang="shell-session">
    --os-project-id testlabs \
user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id cloudvirt-canary --image debian-10.0-buster --flavor cloudvirt-canary-ceph --network lan-flat-cloudinstances2b --property description='canary VM' --availability-zone host:cloudvirt1022 canary1022-01
    --image debian-10.0-buster \
</syntaxhighlight>
    --flavor 2 \
    --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4  \
    --property description='canary VM'   \
    --availability-zone host:cloudvirt1022 \
    canary1022-01


'''NOTE:''' you could also use a script like this: [[User:Arturo_Borrero_Gonzalez#wmcs-canary-vm-refresh.sh | wmcs-canary-vm-refresh.sh]] (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).
'''NOTE:''' you could also use a script like this: [[User:Arturo_Borrero_Gonzalez#wmcs-canary-vm-refresh.sh | wmcs-canary-vm-refresh.sh]] (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).

Revision as of 13:25, 30 September 2020

This page describes some standard admin procedures and operations for our Cloud VPS deployments.

Manual routing failover

In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX).

Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:

Alternatively you can play with other neutron commands to manage agents.

By the time of this writing is not known which method produces less impact in terms of network downtime.

Remove hypervisor

Follow this procedure to remove a virtualizacion server (typically cloudvirtXXXX|labvirtXXXX).

  • Remove or shutdown node
  • openstack hypervisor list will still show it
  • nova service-list will show it as down once it's taken away:

| 9 | nova-compute | labtestvirt2003 | nova | disabled | down | 2017-12-18T20:52:59.000000 | AUTO: Connection to libvirt lost: 0 |

  • nova service-delete 9 will remove where the number is the id from nova service-list

VM/Hypervisor pinning

In case you want to run a concrete VM in a concrete hypervisor, run the command at instance creation time with the --availability-zone option as in the following example:

user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id testlabs --image debian-10.0-buster --flavor g2.cores1.ram2.disk20 --network lan-flat-cloudinstances2b --property description='test VM' --availability-zone host:cloudvirt1022 mytestvm

Canary VM instance in every hypervisor

Each hypervisor should have a canary VM instance running.

The command to create it should be something like:

user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id cloudvirt-canary --image debian-10.0-buster --flavor cloudvirt-canary-ceph --network lan-flat-cloudinstances2b --property description='canary VM' --availability-zone host:cloudvirt1022 canary1022-01

NOTE: you could also use a script like this: wmcs-canary-vm-refresh.sh (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).

Updating openstack database password

Openstack uses many databases, and updating the password requires several steps.

nova

We usually have the same password for the different nova databases nova_eqiad1 and nova_api_eqiad1.

  • in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update the profile::openstack::eqiad1::nova::db_pass hiera key in hieradata/eqiad/profile/openstack/eqiad1/nova.yaml.
  • in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update class passwords::openstack::nova in modules/passwords/manifests/init.pp.
  • in the database (m5-master.eqiad.wmnet), update grants, something like:
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
  • repeat grants for every cloudcontrol server IP and IPv6 address.
  • update cell mapping database connection string (yes, inside the database itself) in m5-master.eqiad.wmnet:
$ mysql nova_api_eqiad1;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_eqiad1' where id=4;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_cell0_eqiad1' where id=1;
  • run puppet everywhere (in cloudcontrol servers etc) so the new password is added to the config files.
  • if puppet is not restarting the affected services, restart them by hand (systemctl restart nova-api, etc)

neutron

TODO: add information.

glance

TODO: add information.

designate

TODO: add information.

keystone

TODO: add information.

See also