You are browsing a read-only backup copy of Wikitech. The live site can be found at

Kubernetes/Clusters/Add or remove nodes

From Wikitech-static
< Kubernetes‎ | Clusters
Revision as of 09:26, 22 November 2021 by imported>JMeybohm
Jump to navigation Jump to search


This is a guide for adding or removing nodes from existing Kubernetes clusters.

Adding a node

With the creation of a Kubernetes cluster a Puppet role for the workers has been created (see: Kubernetes/Clusters/New#General_Puppet/hiera_setup)

Add node specific hiera data

You need to add node specific data, like the failure-domain/topology annotations:

This can be done by creating the file hieradata/hosts/foo-node1001.yaml:


You can get the right region and zone values for your node from Netbox, like

Add node to BGP

Add to homer

Nodes (in the calico setup) need to be able to establish BGP with the core routers. To be able to, they need to be added to as neighbors in config/sites.yaml of the operations/homer/public repository:

    foo_node1001: {4: <Node IPv4>, 6: <Node IPv6}

You will have to run homer, once that change is merged. See: Homer#Running_Homer_from_cumin_hosts_(recommended)

Add to calico

In addition, all nodes are BGP peers for each other. So we need to extend the the hiera key profile::calico::kubernetes::bgp_peers for this Kubernetes cluster with the new nodes FQDN in: hieradata/role/<DATACENTER>/<CLUSTER>/worker.yaml e.g.:

- foo_node1001.eqiad.wmnet

Reimage the node

Then use the re-image script to image you nodes, apply puppet and so on.

Add to conftool/LVS

If the Kubernetes cluster is exposing services via LVS (production clusters usually do, staging ones don't), you need to add the nodes FQDN to the cluster in conftool-data as well. For eqiad in conftool-data/node/eqiad.yaml like:

    foo_node1001.eqiad.wmnet: [kubesvc]


Please ensure you've followed all necessary steps from Server_Lifecycle#Staged_->_Active

Your node should now join the cluster and have workload scheduled automatically (like calico daemonsets). You can check with:

kubectl get nodes

Removing a node

Drain workload

First step to remove a node is to drain workload from it. This is also to ensure that the workload actually still fits the cluster:

kubectl drain --ignore-daemonsets foo-node1001.datacenter.wmnet

You can verify success by looking at what is still scheduled on the node:

kubectl describe node foo-node1001.datacenter.wmnet


You can now follow the steps outlined in Server_Lifecycle#Active_->_Decommissioned

Ensure to also remove:

Delete the node from Kubernetes API

The step left is to delete the node from Kubernetes:

kubectl delete node foo-node1001.datacenter.wmnet