You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Cloud VPS/Admin/Procedures and operations: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arturo Borrero Gonzalez
(→‎nova: mention puppet and services)
imported>Arturo Borrero Gonzalez
(→‎See also: https://www.mediawiki.org/wiki/Inclusive_language#Terms_to_avoid_and_their_alternatives)
 
(8 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This page describes some standard admin '''procedures and operations''' for our Cloud VPS deployments, specifically for the jessie/mitaka/neutron combinations.
This page describes some standard admin '''procedures and operations''' for our Cloud VPS deployments.


= Manual routing failover =
= Manual routing failover =


In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX|labnetXXXX).
In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX).


Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:
Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:
Line 97: Line 97:
= VM/Hypervisor pinning =
= VM/Hypervisor pinning =


In case you want to run a concrete VM in a concrete hypervisor, run this command at instance creation time:
In case you want to run a concrete VM in a concrete hypervisor, run the command at instance creation time with the '''--availability-zone''' option as in the following example:


<code>
<syntaxhighlight lang="shell-session">
OS_TENANT_NAME=myproject openstack server create --flavor 3 --image b7274e93-30f4-4567-88aa-46223c59107e --availability-zone host:cloudvirtXXXX myinstance
user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id testlabs --image debian-10.0-buster --flavor g2.cores1.ram2.disk20 --nic net-id=lan-flat-cloudinstances2b --property description='test VM' --availability-zone host:cloudvirt1022 mytestvm
</code>
</syntaxhighlight>


= Canary VM instance in every hypervisor =
= Canary VM instance in every hypervisor =


Each hypervisor should have a canary VM instance running.
Each hypervisor should have a canary VM instance running (with the exception of cloudvirt1019 and cloudvirt1020 as those hypervisors are not connected to ceph and have no access to the base debian images).


The command to create it should be something like:
The command to create it should be something like:


<code>
<syntaxhighlight lang="shell-session">
root@cloudcontrol1004:~# OS_TENANT_NAME=testlabs openstack server create --flavor 2 --image 10783e59-b30e-4426-b509-2fbef7d3103c --availability-zone host:cloudvirt1007 --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 canary1007-01
user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id cloudvirt-canary --image debian-10.0-buster --flavor cloudvirt-canary-ceph --network lan-flat-cloudinstances2b --property description='canary VM' --availability-zone host:cloudvirt1022 canary1022-01
</code>
</syntaxhighlight>
 
'''NOTE:''' you could also use a script like this: [[User:Arturo_Borrero_Gonzalez#wmcs-canary-vm-refresh.sh | wmcs-canary-vm-refresh.sh]] (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).


= Updating openstack database password =
= Updating openstack database password =


Openstack uses many databases, and updating the password require several stesp.
Openstack uses many databases, and updating the password requires several steps.


== nova ==
== nova ==


We usually have the same password for the different nova dabatases '''nova_eqiad1''' and '''nova_api_eqiad1'''.
We usually have the same password for the different nova databases '''nova_eqiad1''' and '''nova_api_eqiad1'''.


* in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update the '''profile::openstack::eqiad1::nova::db_pass''' hiera key in '''hieradata/eqiad/profile/openstack/eqiad1/nova.yaml'''.
* in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update the '''profile::openstack::eqiad1::nova::db_pass''' hiera key in '''hieradata/eqiad/profile/openstack/eqiad1/nova.yaml'''.
* in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update '''class passwords::openstack::nova''' in '''modules/passwords/manifests/init.pp'''.
* in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update '''class passwords::openstack::nova''' in '''modules/passwords/manifests/init.pp'''.
* in the database (m5-master.eqiad.wmnet), update grants, something like:
* in the openstack database (galera running in cloudcontrol nodes), update grants, something like:
<syntaxhighlight lang="shell-session">
<syntaxhighlight lang="shell-session">
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
Line 133: Line 135:
</syntaxhighlight>
</syntaxhighlight>
* repeat grants for every cloudcontrol server IP and IPv6 address.
* repeat grants for every cloudcontrol server IP and IPv6 address.
* update cell mapping database connection string (yes, inside the database itself) in m5-master.eqiad.wmnet:
* update cell mapping database connection string (yes, inside the database itself) in any cloudcontrol server:
<syntaxhighlight lang="shell-session">
<syntaxhighlight lang="shell-session">
$ mysql nova_api_eqiad1;
$ mysql nova_api_eqiad1;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_eqiad1' where id=4;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@openstack.eqiad1.wikimediacloud.org/nova_eqiad1' where id=4;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@m5-master.eqiad.wmnet/nova_cell0_eqiad1' where id=1;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@openstack.eqiad1.wikimediacloud.org/nova_cell0_eqiad1' where id=1;
</syntaxhighlight>
</syntaxhighlight>
* run puppet everywhere (in cloudcontrol servers etc) so the new password is added to the config files.
* run puppet everywhere (in cloudcontrol servers etc) so the new password is added to the config files.
* if puppet is not restarting the affected services, restart them by hand (systemctl restart nova-api, etc)
* if puppet is not restarting the affected services, restart them by hand (<code>systemctl restart nova-api</code>, etc)


== neutron ==
== neutron ==
Line 157: Line 159:


'''TODO:''' add information.
'''TODO:''' add information.
= Rotating or revoking keystone fernet tokens =
Should you need to rotate or revoke all keystone [https://docs.openstack.org/keystone/rocky/admin/identity-fernet-token-faq.html fernet tokens], follow this procedure:
* on all cloudcontrol nodes
rm -rf /etc/keystone/fernet-keys
* on one cloudcontrol node:
keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
* on each other cloudcontrol node:
rsync -a --delete rsync://<host where you ran fernet_setup>.wikimedia.org/keystonefernetkeys/* /etc/keystone/fernet-keys/
* on labweb/cloudweb hosts:
service memcached restart
service apache2 restart


= See also =
= See also =
Line 162: Line 178:
* [[Portal:Cloud_VPS/Admin/Maintenance]]
* [[Portal:Cloud_VPS/Admin/Maintenance]]
* [[Portal:Cloud_VPS/Admin/Projects_lifecycle]]
* [[Portal:Cloud_VPS/Admin/Projects_lifecycle]]
* [[Portal:Cloud_VPS/Admin/Deployment_sanity_checklist]]
* [[Portal:Cloud_VPS/Admin/Deployment_confidence_checklist]]


[[Category:VPS admin]]
[[Category:VPS admin]]

Latest revision as of 09:23, 24 June 2021

This page describes some standard admin procedures and operations for our Cloud VPS deployments.

Manual routing failover

In the old nova-network days, a very long procedure was required to manually failover from a dead/under-maintenance network node (typically cloudnetXXXX).

Nowadays is much more simpler. This procedure assumes you want to move the active service from one node to the other:

Alternatively you can play with other neutron commands to manage agents.

By the time of this writing is not known which method produces less impact in terms of network downtime.

Remove hypervisor

Follow this procedure to remove a virtualizacion server (typically cloudvirtXXXX|labvirtXXXX).

  • Remove or shutdown node
  • openstack hypervisor list will still show it
  • nova service-list will show it as down once it's taken away:

| 9 | nova-compute | labtestvirt2003 | nova | disabled | down | 2017-12-18T20:52:59.000000 | AUTO: Connection to libvirt lost: 0 |

  • nova service-delete 9 will remove where the number is the id from nova service-list

VM/Hypervisor pinning

In case you want to run a concrete VM in a concrete hypervisor, run the command at instance creation time with the --availability-zone option as in the following example:

user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id testlabs --image debian-10.0-buster --flavor g2.cores1.ram2.disk20 --nic net-id=lan-flat-cloudinstances2b --property description='test VM' --availability-zone host:cloudvirt1022 mytestvm

Canary VM instance in every hypervisor

Each hypervisor should have a canary VM instance running (with the exception of cloudvirt1019 and cloudvirt1020 as those hypervisors are not connected to ceph and have no access to the base debian images).

The command to create it should be something like:

user@cloudcontrol1005:~$ sudo wmcs-openstack server create --os-project-id cloudvirt-canary --image debian-10.0-buster --flavor cloudvirt-canary-ceph --network lan-flat-cloudinstances2b --property description='canary VM' --availability-zone host:cloudvirt1022 canary1022-01

NOTE: you could also use a script like this: wmcs-canary-vm-refresh.sh (a custom helper script made by Arturo to refresh canary VMs in every hypervisor).

Updating openstack database password

Openstack uses many databases, and updating the password requires several steps.

nova

We usually have the same password for the different nova databases nova_eqiad1 and nova_api_eqiad1.

  • in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update the profile::openstack::eqiad1::nova::db_pass hiera key in hieradata/eqiad/profile/openstack/eqiad1/nova.yaml.
  • in the puppet private repo (in puppetmaster1001.eqiad.wmnet), update class passwords::openstack::nova in modules/passwords/manifests/init.pp.
  • in the openstack database (galera running in cloudcontrol nodes), update grants, something like:
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_api_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'208.80.153.x' IDENTIFIED BY '<%= @db_pass %>';
GRANT ALL PRIVILEGES ON nova_cell0_eqiad1.* TO 'nova'@'%' IDENTIFIED BY '<%= @db_pass %>';
  • repeat grants for every cloudcontrol server IP and IPv6 address.
  • update cell mapping database connection string (yes, inside the database itself) in any cloudcontrol server:
$ mysql nova_api_eqiad1;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@openstack.eqiad1.wikimediacloud.org/nova_eqiad1' where id=4;
[nova_api_eqiad1]> update cell_mappings set database_connection='mysql://nova:<password>@openstack.eqiad1.wikimediacloud.org/nova_cell0_eqiad1' where id=1;
  • run puppet everywhere (in cloudcontrol servers etc) so the new password is added to the config files.
  • if puppet is not restarting the affected services, restart them by hand (systemctl restart nova-api, etc)

neutron

TODO: add information.

glance

TODO: add information.

designate

TODO: add information.

keystone

TODO: add information.

Rotating or revoking keystone fernet tokens

Should you need to rotate or revoke all keystone fernet tokens, follow this procedure:

  • on all cloudcontrol nodes
rm -rf /etc/keystone/fernet-keys
  • on one cloudcontrol node:
keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
  • on each other cloudcontrol node:
rsync -a --delete rsync://<host where you ran fernet_setup>.wikimedia.org/keystonefernetkeys/* /etc/keystone/fernet-keys/
  • on labweb/cloudweb hosts:
service memcached restart
service apache2 restart

See also