You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Ganeti: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Alexandros Kosiaris
imported>Alexandros Kosiaris
Line 180: Line 180:
should return 0 for both primary instances as well as secondary instances. Before powering off the node we need to remove it from the cluster as well
should return 0 for both primary instances as well as secondary instances. Before powering off the node we need to remove it from the cluster as well


    sudo gnt-node evacuate -s  <node_fqdn> # Removes it as a secondary as well
     sudo gnt-node remove <node_fqdn>
     sudo gnt-node remove <node_fqdn>


Line 185: Line 186:


     sudo gnt-node add <node_fqdn>
     sudo gnt-node add <node_fqdn>
== Failed hardware node ==
When a host is having problem (hardware/kernel/otherwise) and it's unreachable, the are a number of possible avenues to solve the issue, but an empirical good way out is:
* Just powercycle the host. If that works, it's probably the faster way out. Most services should anyway be set up highly available and if we got one that is not we either should set it that way or not care too much when it fails. If this works, you are done, if not keep on reading.
* If the above doesn't work (the node never comes back up), start the VMs on another host. This can be done with
    sudo gnt-node failover -f <node_fqdn>
in some cases you may have to ignore the consistency checks (this has never happened in our setup), pass --ignore-consistency. Again all important services are set in high available setups (and could easily reimaged) be  so this will only severely bite VMs that are not setup that way.
* Remove the host from the cluster
    sudo gnt-node remove <node_fqdn>
* Debug/fix/RMA the node.


= VM operations =
= VM operations =

Revision as of 14:30, 3 March 2018

Introduction

Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It supports both KVM and Xen. At WMF we only have KVM as an enabled hypervisor. Primary Ganeti web page is http://www.ganeti.org/.

At WMF, ganeti is used as a cluster management tool for a private VPS cloud installation. After an evaluation process of Openstack vs Ganeti, Ganeti was chosen as a more fitting software for the job at hand.

Architecture

The architecture of ganeti is described below

Overview

Ganeti is architected as a shared nothing cluster with job management. There is one master node that receives all jobs to be executed (create a VM, delete a VM, stop/start VMs, etc) that can be swapped between a preconfigured number of master candidates in case of a hardware failure. That allows for no single point of failure for cluster operations. For VMs operations, provided the DRBD backend is used, which we do in WMF, even in the case of catastrophic failure for a hardware node, VMs can be restarted with minimal disruption on their secondary (backup) node. There's the notion of a nodegroup. That's practically a group of nodes. Think of it as nothing more than a subcluster division. Operations are usually constrained within a nodegroup and do not cross the boundaries unless specifically instructed to

A high level overview of the architecture is here http://docs.ganeti.org/ganeti/2.12/html/_images/graphviz-246e5775f608681df9f62dbbe0a5d4120dc75f1c.png and more discussion about it is in http://docs.ganeti.org/ganeti/2.12/html/design-2.0.html

Specifics

A cluster is identified by:

  • The nodes
  • An FQDN (e.g ganeti01.svc.eqiad.wmnet), which obviously corresponds to an IPv4 address. That IPv4 address is "floating", meaning that it is owned by the current master.

Administration always happens via the master. It is the only node where all commands can be run and hosts the API. Failover of a master is easy but manual. See below for more information on how to do it.

Connect to a cluster

Just ssh to its FQDN

  ssh ganeti01.svc.eqiad.wmnet

Cluster Operations

Init the cluster

An example of a initializing a new cluster:

  sudo gnt-cluster init \
  --no-ssh-init \ 
  --enabled-hypervisors=kvm \
  --vg-name=ganeti \
  --master-netdev=private \
  --hypervisor-parameters kvm:kvm_path=/usr/bin/qemu-system-x86_64,kvm_flag=enabled,serial_speed=115200,migration_bandwidth=64,migration_downtime=500,kernel_path= \
  --nic-parameters=link=private \ 
  ganeti01.svc.codfw.wmnet

The above is the way we currently have our clusters configured

Modify the cluster

Modifying the cluster to change defaults, parameters of hypervisors, limits, security model etc is possible. An example of modifying the cluster is given below.

  sudo gnt-cluster modify -H kvm:kvm_path=/usr/bin/qemu-system-x86_64,kvm_flag=enabled,kernel_path=

To get an idea of what is actually modifiable do a:

  sudo gnt-cluster info

and then lookup in ganeti documentation the various options [1]

Destroy the cluster

Destroying the cluster is a one way street. Do not do it lightly. An example of destroying a cluster:

 sudo gnt-cluster destroy  --yes-do-it

do note that various things will be left behind. For example /var/lib/ganeti/queue/ will not be deleted. It's up to you if you want to delete it or not, depending on the case.

Add a node

Adding a new hardware node to the cluster to increase capacity

  sudo gnt-node add ganeti1002.eqiad.wmnet

Listing cluster nodes

Listing all hardware nodes in a cluster:

  sudo gnt-node list

That should return something like:

  Node                   DTotal  DFree MTotal MNode MFree Pinst Sinst
  ganeti1001.eqiad.wmnet 427.9G 427.9G  63.0G  391M 62.4G     0     0
  ganeti1002.eqiad.wmnet 427.9G 427.9G  63.0G  289M 62.5G     0     0
  ganeti1003.eqiad.wmnet 427.9G 427.9G  63.0G  288M 62.5G     0     0
  ganeti1004.eqiad.wmnet 427.9G 427.9G  63.0G  288M 62.5G     0     0

The columns are respectively: Disk Total, Disk Free, Memory Toal, Memory used by node itself, Memory Free, Instances for which this node is primary, instances for which this node is secondary

The nodegroup information you be obtained via a command like

   sudo gnt-node list -o name,group

which would return the name and the group of node

   Node                   Group
   ganeti1001.eqiad.wmnet row_C
   ganeti1002.eqiad.wmnet row_C
   ganeti1003.eqiad.wmnet row_C
   ganeti1004.eqiad.wmnet row_C
   ganeti1005.eqiad.wmnet row_A
   ganeti1006.eqiad.wmnet row_A
   ganeti1007.eqiad.wmnet row_A

Detecting the master node

The master node can be queried by running

 sudo gnt-node list -o name,master_candidate,master

View the job queue

Ganeti has a job queue built-in. Most of the times it's working fine but if something is taking too long it might be helpful to check what's going on in the job queue

  gnt-job list

and getting a job id from the result

  gnt-job info #ID

Cluster upgrades

Hardware/software upgrades on a ganeti cluster can happen with 0 downtime to the VMs operations. The procedure to do so is outlined below. In case a shutdown/reboot is needed the procedure to empty to node is described. The rolling

Do the software upgrade (if needed)

  apt-get safe-upgrade

throughout the cluster. It should have 0 repercussions to any VM anyway. Barring a Ganeti bug in the upgraded version, the cluster itself should also have 0 problems. Between minor versions (e.g. 2.12 -> 2.15) it may be required to run some upgrade script. Read the changelog/upgrades notes. The Debian maintainer builds the package in a way that both versions will be installed and until you run said script the old version will still be used.

Rolling reboot

Doing a rolling reboot of the cluster is easy. Empty every node, reboot it, check that it is online, proceed to the next. The one thing to take care is to not reboot the master without failing it over first.

Failover the master

Choose a master candidate that suits you. You can get master candidates by

  sudo gnt-node list -o name,master_candidate

Login and

  sudo gnt-cluster master-failover

The cluster IP will now be served by the new node and the old one is no longer the master.

Cluster rebalancing

There might be a time where the cluster will look/actually be unbalanced. That will be true after a rolling reboot of the nodes. Doing a rebalancing is easy and baked into ganeti, all it takes is running a command

  sudo hbal -L -X -G <node_group>

Please run it in a screen session. It might take quite a while to finish. The jobs have been submitted so it's fine losing that session but it's still prudent.

The cluster will calculate a current score, run some heuristic algorithms to try and minimize that score and then execute the commands require to reach that state.

Node operations

Reboot/Shutdown for maintenance a node

Select a node that needs rebooting/shutdown for brief hardware maintenance and empty of primary instances

   sudo gnt-node migrate -f ganeti1004.eqiad.wmnet

Now a

   sudo gnt-node list

should return 0 primary instances for the node. It is safe to reboot it or shut it down for a brief amount of time for hardware maintenance

Shutdown a node for a prolonged period of time

Should the node be going down for an undetermined amount of time, also move the secondary instances

   sudo gnt-node migrate -f <node_fqdn>
   sudo gnt-node evacuate -s  <node_fqdn>

The second command means moving around DRBD pairs and syncing disk data. It is bound to take a long time, so find something else to do in the meanwhile

Now a

   sudo gnt-node list 

should return 0 for both primary instances as well as secondary instances. Before powering off the node we need to remove it from the cluster as well

   sudo gnt-node evacuate -s  <node_fqdn> # Removes it as a secondary as well
   sudo gnt-node remove <node_fqdn>

NOTE: Do not forget to readd it after it is fixed (if it ever is)

   sudo gnt-node add <node_fqdn>

Failed hardware node

When a host is having problem (hardware/kernel/otherwise) and it's unreachable, the are a number of possible avenues to solve the issue, but an empirical good way out is:

  • Just powercycle the host. If that works, it's probably the faster way out. Most services should anyway be set up highly available and if we got one that is not we either should set it that way or not care too much when it fails. If this works, you are done, if not keep on reading.
  • If the above doesn't work (the node never comes back up), start the VMs on another host. This can be done with
   sudo gnt-node failover -f <node_fqdn>

in some cases you may have to ignore the consistency checks (this has never happened in our setup), pass --ignore-consistency. Again all important services are set in high available setups (and could easily reimaged) be so this will only severely bite VMs that are not setup that way.

  • Remove the host from the cluster
   sudo gnt-node remove <node_fqdn>
  • Debug/fix/RMA the node.

VM operations

Listing VMs

   gnt-instance list 

Create a VM

Creating a VM is easy. Most of the steps are the same as for production so keep in mind the regular process as well.

Assign a hostname/IP

Same process as for hardware. Assign the IP/hostname and make sure DNS changes are live before going forward. This means you will also get to decide on which row this VM will live in. We don't have ganeti on all rows, make sure you allocate the IP on a row we got ganeti in. You can get the rows via

   sudo gnt-group list

which should return something like the output below

   Group Nodes Instances AllocPolicy NDParams
   row_A     3        20 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1,ovs_name=switch1, oob_program=
   row_C     4        19 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1,ovs_name=switch1, oob_program=

This means we can place VMs in rows A and C for this DC. Keep the name of the group (e.g. row_A, row_C) you will need it when creating the VM.

Create the VM (private IP)

   gnt-instance add \
     -t drbd \
     -I hail \
     --net 0:link=private \
     -g <nodegroup> \
     --hypervisor-parameters=kvm:boot_order=network \
     -o debootstrap+default \
     --no-install \
     -B vcpus=<x>,memory=<y>g \
     --disk 0:size=<z>g \
     <fqdn>

Note the the VM will NOT be started. That's on purpose for now. <x>, <y>, <z> on the above are variables. The uni sizes t,g,m denote tera,giga,mega bytes respectively. <nodegroup> is also a variable and it's the rack row you got from the above command. So valid values are row_A, row_B, row_C, row_D.

Create the VM (public IP)

   gnt-instance add \
     -t drbd \
     -I hail \
     --net 0:link=public \
     -g <nodegroup> \
     --hypervisor-parameters=kvm:boot_order=network \
     -o debootstrap+default \
     --no-install \
     -B vcpus=<x>,memory=<y>g \
     --disk 0:size=<z>g \
     <fqdn>

Note that the only difference between public/private IP is that the link is specified differently (public vs private). Everything else is exactly the same as above

Get the MAC address of the NIC

   gnt-instance info <fqdn> | grep MAC

Get the MAC address

Update DHCP

Same as usual. Use linux-host-entries.ttyS0-115200 for Ganeti VMs. Otherwise you will not be getting a console

Update autoinstall files

Same as usual. Do however add virtual.cfg to the configuration for a specific VM. Example: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/install-server/files/autoinstall/netboot.cfg;601720be51228f7eae3de17988b1afa8881a5bdb$71

Start the VM

  gnt-instance start <fqdn>

and connect to the console

  gnt-instance console <fqdn>

Ctrl+] to leave the console

Set boot order to disk

Assuming the installation goes on well but before it finishes, you need to set the boot order back to disk. This is a limitation of the current version of the Ganeti software and is expected to be solved (upstream is aware).

  gnt-instance modify \
  --hypervisor-parameters=boot_order=disk \
  <fqdn>

Note: when the VM has finished installing, it will shutdown automatically. The Ganeti software includes HA checks and will promptly restart it. We rely on this behaviour to have the VM successfully installed. However, if you list the VMs during this phase you will see the VM in ERROR_down. Don't worry, this is expected.

Assign role to the VM in puppet

As usual.

Delete a VM

Irrevocably deleting a VM is done via:

  gnt-instance remove <fqdn>

Please remember to clean up DHCP/DNS entries afterwards

Shutdown/startup a VM

  gnt-instance startup <fqdn>
  gnt-instance shutdown <fqdn>

Note: In the shutdown command, ACPI will be used to achieve a graceful shutdown of the VM. A 2 minute timeout exists however, after which the VM will be forcefully shutdown. In case you prefer to not wait those 2 minutes, --timeout exists and can be used like so

  gnt-instance shutdown --timeout 0 <fqdn>

Get a console for a VM

You can get log into the "console" for a Ganeti instance via

  gnt-instance console <fqdn>

The console can be left with "ctrl + ]"

Resize a VM

Make sure first that the cluster has adequate space for whatever resource you want to increase (if you do want to increase and not decrease a resource). This is done manually by a combination of grafana statitics for CPU/Memory utilization and the output of gnt-node list for disk space utilization. After that you can issue the following command to increase/decrease the memory size and number of Virtual CPUs assigned to a VM

 gnt-instance modify -B mem=<X>[gm],vcpus=<N> <fqdn>

where X, N are physical numbers. X can be suffixed by g or m for Gigabytes or Megabytes (please don't do Terabytes ;))

Adding space to an existing disk is possible. But do note that the resizing of partitions and filesystems is up to you, as ganeti can't do it for you. The command would be.

 gnt-instance modify --disk #:size=X[gmt] <fqdn>

where # is the number of disk starting from 0. You can get the disks allocated to a VM using gnt-instance info <fqdn>. Again X is a physical number suffixed for Gigabytes/Megabytes/Terabytes.

Adding a disk is also easy if you want to avoid the mess with having to resize partitions/filesystems. The command would be:

 gnt-instance modify --disk add:size=X[gmt] <fqdn>

Again X is a physical number suffixed for Gigabytes/Megabytes/Terabytes.

Notes

All of the commands that have a Y/N prompt can be forced with a -f. For example the following will spare you the prompt

  gnt-instance remove -f <fqdn>

All commands are actually jobs. If you would rather not wait on the prompt --submit will do the trick

  gnt-instance shutdown --submit <fqdn>

API