You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Toolforge/Admin/Kubernetes/Upgrading Kubernetes: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arturo Borrero Gonzalez
(→‎Upgrade worker nodes: add info about the wmcs-k8s-node-upgrade script)
imported>Quiddity
m (fix syntaxhighlight)
 
(9 intermediate revisions by 6 users not shown)
Line 8: Line 8:


* To begin, check your version with <code>kubectl version</code> so you know where you are starting from.
* To begin, check your version with <code>kubectl version</code> so you know where you are starting from.
* Make sure all related DEB pkgs (kubeadm, kubectl, kubelet) are available in the desired versions in [[reprepro]]. This might involve puppet patches and repo updates.
* Make sure all related DEB pkgs (kubeadm, kubectl, kubelet) are available in the desired versions in [[reprepro]]. This might involve puppet patches and repo updates. Review hiera setting for '''profile::wmcs::kubeadm::component'''.
* Are you also upgrading [https://docs.projectcalico.org/v3.10/maintenance/kubernetes-upgrade#upgrading-an-installation-that-uses-the-kubernetes-api-datastore Calico]?  If you are, are you upgrading a patch version or a minor/major release? If just a patch, you can probably just update the profile::toolforge::k8s::calico_version in hiera, adjust the profile::toolforge::k8s::calicoctl_sha value to the new file in the release bundle and  use puppet's changed file and the kubectl apply command below to upgrade.  If this is a minor or major release, please check the new release yaml file and make sure the puppet yaml template in modules/toolforge/templates/k8s is updated, if needed. Then, proceed. When checking the external docs on that, know that we are using the Kubernetes API datastore and are using Calico for policy and networking.
* Are you also upgrading [https://docs.projectcalico.org/v3.10/maintenance/kubernetes-upgrade#upgrading-an-installation-that-uses-the-kubernetes-api-datastore Calico]?  If you are, are you upgrading a patch version or a minor/major release? If just a patch, you can probably just update the profile::toolforge::k8s::calico_version in hiera, adjust the profile::toolforge::k8s::calicoctl_sha value to the new file in the release bundle and  use puppet's changed file and the kubectl apply command below to upgrade.  If this is a minor or major release, please check the new release yaml file and make sure the puppet yaml template in modules/toolforge/templates/k8s is updated, if needed. Then, proceed. When checking the external docs on that, know that we are using the Kubernetes API datastore and are using Calico for policy and networking.
== Begin Upgrade ==
* disable puppet on all control/worker/ingress nodes
** sudo cumin 'O{project:<project name> name:-k8s-}' # check that it finds the expected hosts
** sudo cumin 'O{project:<project name> name:-k8s-}' 'puppet agent --disable "upgrading k8s <your name>"'
<syntaxhighlight lang=bash>
sudo cumin 'O{project:tools name:-k8s-}' 'puppet agent --disable "upgrading k8s <your name>"'
</syntaxhighlight>
* downtime tools on metricsinfra - https://prometheus-alerts.wmcloud.org
** find a label in https://prometheus.wmcloud.org/cloud/targets that matches project
** select the bell with a \ and enter downtime there.
* update project-wide (and, if existing, control-specific) version hiera key
profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-<old>'
TO
profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-<new>'
* update topic on wikimedia-cloud "Status: Ok" to "Status: upgrading <tools/paws/something> k8s"


== Upgrade control nodes ==
== Upgrade control nodes ==


<!-- This will apply in the upgrade to 1.16
Now take this control plane node out of rotation where $thiscontrolplanenode is the node name of the control plane system you are running the commands on:
Now take this control plane node out of rotation where $thiscontrolplanenode is the node name of the control plane system you are running the commands on:
<syntaxhighlight lang=shell-session>
<syntaxhighlight lang=shell-session>
root@controlplanenode #kubectl drain $thiscontrolplanenode --ignore-daemonsets<syntaxhighlight>
root@control-01:~# kubectl drain $thiscontrolplanenode --ignore-daemonsets</syntaxhighlight>
-->
 
Then use apt to upgrade kubeadm on the same node.
 
<syntaxhighlight lang=shell-session>
root@control-01:~# apt install kubeadm</syntaxhighlight>
 
Check and plan what an upgrade will entail and building an upgrade plan. The command for this is fairly straightforward.
Check and plan what an upgrade will entail and building an upgrade plan. The command for this is fairly straightforward.


<syntaxhighlight lang=shell-session>root@controlplanenode # kubeadm upgrade plan
<syntaxhighlight lang=shell-session>
root@control-01:~# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] Reading configuration from the cluster...
Line 60: Line 83:


Now you can proceed with:
Now you can proceed with:
<syntaxhighlight lang=shell-session>root@controlplanenode # kubeadm upgrade apply v1.15.1</syntaxhighlight>
<syntaxhighlight lang=shell-session>
root@control-01:~# kubeadm upgrade apply v1.15.1</syntaxhighlight>
 
Obviously, this is assuming you were upgrading to v1.15.1 (at the time of this writing, we are at v1.15.5, so I hope you aren't using that number). This will produce a fair bit of output. Do check it for errors.
Obviously, this is assuming you were upgrading to v1.15.1 (at the time of this writing, we are at v1.15.5, so I hope you aren't using that number). This will produce a fair bit of output. Do check it for errors.


Upgrade Calico, if you are doing, so with <code>kubectl apply -f /etc/kubernetes/calico.yaml</code> once puppet has updated the file.
Upgrade Calico, if you are doing, so with <code>kubectl apply -f /etc/kubernetes/calico.yaml</code> once puppet has updated the file.


<!--
Put the node back in play:
Put the node back in play:
<syntaxhighlight lang=shell-session>root@controlplanenode #kubectl uncordon $thiscontrolplanenode<syntaxhighlight>
<syntaxhighlight lang=shell-session>
-->
root@control-01:~# kubectl uncordon $thiscontrolplanenode
</syntaxhighlight>


For the next control plane nodes, you do not need to run <code>kubectl upgrade plan</code>, but and instead of <code>kubeadm upgrade apply</code>, you run <code>kubeadm upgrade node</code>. <code>kubectl apply</code> is idempotent, so anything that doesn't need upgrading for calico will do nothing, and is perfectly fine to run.
Wait until all control plane pods (scheduler, apiserver and controller-manager) start up, do not start crash looping and don't have any errors in their logs.


Upgrade kubelet and kubeadm packages on all control plane nodes. Restart kubelet if it hasn't already.
For the next control plane nodes, make sure you still drain and uncordon but you do not need to run <code>kubectl upgrade plan</code>, but and instead of <code>kubeadm upgrade apply</code>, you run <code>kubeadm upgrade node</code>. <code>kubectl apply</code> is idempotent, so anything that doesn't need upgrading for calico will do nothing, and is perfectly fine to run.
 
Upgrade helm, kubelet and kubeadm packages on all control plane nodes. Restart kubelet if it hasn't already.


== Upgrade worker nodes ==
== Upgrade worker nodes ==


Once the control nodes have been upgraded, we can upgrade the workers.
Once the control nodes have been upgraded, we can upgrade the workers.
=== manual steps ===
For each worker:
For each worker:
# Drain it <syntaxhighlight lang=shell-session>root@controlplanenode # kubectl drain $NODE --ignore-daemonsets</syntaxhighlight>
# run puppet and make sure apt is aware of the desired package versions for kubeadm, kubelet, kubectl, docker, containerd.io, etc.
# On the node, upgrade it's kubelet config <syntaxhighlight lang=shell-session>root@workernode # kubeadm upgrade node</syntaxhighlight>
# Drain it <syntaxhighlight lang=shell-session>root@control-01:~# kubectl drain $NODE --ignore-daemonsets</syntaxhighlight>
# On the node, upgrade it's kubelet config <syntaxhighlight lang=shell-session>root@worker-01:~# kubeadm upgrade node</syntaxhighlight>
# Upgrade kubectl and kubelet packages
# Upgrade kubectl and kubelet packages
# Restart kubelet
# Restart kubelet
# Run puppet in case there's any config we have that isn't captured by kubeadm.
# Run puppet in case there's any config we have that isn't captured by kubeadm.
# Uncordon <syntaxhighlight lang=shell-session>root@controlplanenode # kubectl uncordon $NODE</syntaxhighlight>
# Uncordon <syntaxhighlight lang=shell-session>root@control-01:~# kubectl uncordon $NODE</syntaxhighlight>


'''NOTE''' mind the k8s API is behind the FQDN <code>k8s.tools.eqiad1.wikimedia.cloud</code>, some commands may vary their output/results depending on which backend is HAproxy reaching. This can be prevented by disabling backends by hand during the upgrade window.
'''NOTE''' mind the k8s API is behind the FQDN <code>k8s.tools.eqiad1.wikimedia.cloud</code>, some commands may vary their output/results depending on which backend is HAproxy reaching. This can be prevented by disabling backends by hand during the upgrade window.


Due to the potentially large number of worker nodes, there is a script to automate this process: '''wmcs-k8s-node-upgrade.py''':
=== automated process ===
 
{{note|make sure hiera data is properly set for worker nodes, specifically '''profile::wmcs::kubeadm::component'''. But don't do packages upgrades by hand, the script will do it for you!}}
 
Due to the potentially large number of worker nodes, there is a script to automate this process: '''wmcs-k8s-node-upgrade.py'''.


<syntaxhighlight lang="shell-session>
<syntaxhighlight lang="shell-session>
Line 102: Line 137:
   --domain DOMAIN      The CloudVPS domain for building FQDNs. Typical values are: 'eqiad1.wikimedia.cloud' or 'eqiad.wmflabs'. Defaults to 'eqiad1.wikimedia.cloud'
   --domain DOMAIN      The CloudVPS domain for building FQDNs. Typical values are: 'eqiad1.wikimedia.cloud' or 'eqiad.wmflabs'. Defaults to 'eqiad1.wikimedia.cloud'
   --src-version SRC_VERSION
   --src-version SRC_VERSION
                         Source/original kubernetes version. Defaults to '1.15.6'
                         Source/original kubernetes version. Defaults to '1.16.9'
   --dst-version DST_VERSION
   --dst-version DST_VERSION
                         Destination/target kubernetes version. Defaults to '1.16.9'
                         Destination/target kubernetes version. Defaults to '1.17.13'
   -n NODE, --node NODE  Hostname of target node to upgrade. Can be specified multiple times for multiple nodes in the same script run. Can be combined with the '--file' option. The FQDN will be built using the
   -n NODE, --node NODE  Hostname of target node to upgrade. Can be specified multiple times for multiple nodes in the same script run. Can be combined with the '--file' option. The FQDN will be built using the
                         project and domain argument. Example: -n tools-k8s-worker-1 -ntools-k8s-worker-2
                         project and domain argument. Example: -n tools-k8s-worker-1 -ntools-k8s-worker-2
Line 124: Line 159:


<syntaxhighlight lang="shell-session>
<syntaxhighlight lang="shell-session>
user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --src-version 1.15 --dst-version 1.16 --file nodelist.txt
user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --src-version 1.16 --dst-version 1.17 --file nodelist.txt
[..]
[..]
</syntaxhighlight>
</syntaxhighlight>
Line 131: Line 166:


<syntaxhighlight lang="shell-session>
<syntaxhighlight lang="shell-session>
user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-1 --project tools --domain eqiad.wmflabs -n tools-k8s-worker-1 -n tools-k8s-worker-2 --src-version 1.15 --dst-version 1.16 -p --dry-run
user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-1 --project tools --domain eqiad1.wikimedia.cloud -n tools-k8s-worker-1 -n tools-k8s-worker-2 --src-version 1.16 --dst-version 1.17 -p --dry-run
[wmcs-k8s-node-upgrade.py] INFO: stage: generating node list
[wmcs-k8s-node-upgrade.py] INFO: stage: generating node list
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: stage: precheks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: prechecks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl drain --force --ignore--daemonsets --delete-local-data tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl drain --force --ignore-daemonsets --delete-local-data tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "sudo kubeadm upgrade node"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo kubeadm upgrade node --certificate-renewal=true"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker-ce containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y --allow-downgrades"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-1.tools.eqiad.wmflabs "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl uncordon tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl uncordon tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: stage: precheks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: prechecks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl drain --force --ignore--daemonsets --delete-local-data tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl drain --force --ignore-daemonsets --delete-local-data tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "sudo kubeadm upgrade node"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo kubeadm upgrade node --certificate-renewal=true"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker-ce containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y --allow-downgrades"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-worker-2.tools.eqiad.wmflabs "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh tools-k8s-control-1.tools.eqiad.wmflabs "sudo -i kubectl uncordon tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl uncordon tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: nothing else to do
[wmcs-k8s-node-upgrade.py] INFO: nothing else to do
</syntaxhighlight>
</syntaxhighlight>
== Ingress nodes ==
The ingress nodes are similar to the worker nodes but they need some special treatment:
* With admin credentials on kubernetes, run <code>kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2</code> to prevent an ingress controller from being scheduled on a regular node.
* Ingress pods take a while to evict. It should be safe to upgrade the ingress nodes in parallel with the normal worker nodes.
* When done <code>kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3</code> to return the cluster to normal operation.
== finishing touches ==
<syntaxhighlight lang=bash>
sudo cumin 'O{project:<project name> name:-k8s-}' 'puppet agent --enable'
</syntaxhighlight>
if toolforge upgrade:
upgrade kubectl on bastions
* tools-sgebastion-07.tools.eqiad1.wikimedia.cloud / login.toolforge.org
* tools-sgebastion-08.tools.eqiad1.wikimedia.cloud / dev.toolforge.org
* tools-sgebastion-10.tools.eqiad1.wikimedia.cloud / login-buster.toolforge.org
* tools-sgebastion-11.tools.eqiad1.wikimedia.cloud / dev-buster.toolforge.org
revert topic changes on -cloud

Latest revision as of 02:52, 30 October 2021

This document only applies to a kubeadm-managed cluster deployed as described in Portal:Toolforge/Admin/Kubernetes/Deploying.

Before the upgrade

Some considerations before you perform the important operations.

  • To begin, check your version with kubectl version so you know where you are starting from.
  • Make sure all related DEB pkgs (kubeadm, kubectl, kubelet) are available in the desired versions in reprepro. This might involve puppet patches and repo updates. Review hiera setting for profile::wmcs::kubeadm::component.
  • Are you also upgrading Calico? If you are, are you upgrading a patch version or a minor/major release? If just a patch, you can probably just update the profile::toolforge::k8s::calico_version in hiera, adjust the profile::toolforge::k8s::calicoctl_sha value to the new file in the release bundle and use puppet's changed file and the kubectl apply command below to upgrade. If this is a minor or major release, please check the new release yaml file and make sure the puppet yaml template in modules/toolforge/templates/k8s is updated, if needed. Then, proceed. When checking the external docs on that, know that we are using the Kubernetes API datastore and are using Calico for policy and networking.

Begin Upgrade

  • disable puppet on all control/worker/ingress nodes
    • sudo cumin 'O{project:<project name> name:-k8s-}' # check that it finds the expected hosts
    • sudo cumin 'O{project:<project name> name:-k8s-}' 'puppet agent --disable "upgrading k8s <your name>"'
sudo cumin 'O{project:tools name:-k8s-}' 'puppet agent --disable "upgrading k8s <your name>"'
profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-<old>'
TO
profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-<new>'
  • update topic on wikimedia-cloud "Status: Ok" to "Status: upgrading <tools/paws/something> k8s"

Upgrade control nodes

Now take this control plane node out of rotation where $thiscontrolplanenode is the node name of the control plane system you are running the commands on:

root@control-01:~# kubectl drain $thiscontrolplanenode --ignore-daemonsets

Then use apt to upgrade kubeadm on the same node.

root@control-01:~# apt install kubeadm

Check and plan what an upgrade will entail and building an upgrade plan. The command for this is fairly straightforward.

root@control-01:~# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.0
[upgrade/versions] Latest stable version: v1.15.1
[upgrade/versions] Latest version in the v1.15 series: v1.15.1

External components that should be upgraded manually before you upgrade the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT   AVAILABLE
Etcd        3.2.26    3.3.10

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     5 x v1.15.0   v1.15.1

Upgrade to the latest version in the v1.15 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.15.0   v1.15.1
Controller Manager   v1.15.0   v1.15.1
Scheduler            v1.15.0   v1.15.1
Kube Proxy           v1.15.0   v1.15.1
CoreDNS              1.3.1     1.3.1

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.15.1

Note: Before you can perform this upgrade, you have to update kubeadm to v1.15.1.

Some important things to note here:

  • Etcd is external, so upgrades there need to involve the packaged versions. Make sure that the version we are using (or that can be upgraded to) is acceptable to the new version of Kubernetes before trying anything.
  • kubeadm is deployed from packages, which need to be upgraded, including kubelet in order to finish an upgrade.

Now you can proceed with:

root@control-01:~# kubeadm upgrade apply v1.15.1

Obviously, this is assuming you were upgrading to v1.15.1 (at the time of this writing, we are at v1.15.5, so I hope you aren't using that number). This will produce a fair bit of output. Do check it for errors.

Upgrade Calico, if you are doing, so with kubectl apply -f /etc/kubernetes/calico.yaml once puppet has updated the file.

Put the node back in play:

root@control-01:~# kubectl uncordon $thiscontrolplanenode

Wait until all control plane pods (scheduler, apiserver and controller-manager) start up, do not start crash looping and don't have any errors in their logs.

For the next control plane nodes, make sure you still drain and uncordon but you do not need to run kubectl upgrade plan, but and instead of kubeadm upgrade apply, you run kubeadm upgrade node. kubectl apply is idempotent, so anything that doesn't need upgrading for calico will do nothing, and is perfectly fine to run.

Upgrade helm, kubelet and kubeadm packages on all control plane nodes. Restart kubelet if it hasn't already.

Upgrade worker nodes

Once the control nodes have been upgraded, we can upgrade the workers.

manual steps

For each worker:

  1. run puppet and make sure apt is aware of the desired package versions for kubeadm, kubelet, kubectl, docker, containerd.io, etc.
  2. Drain it
    root@control-01:~# kubectl drain $NODE --ignore-daemonsets
    
  3. On the node, upgrade it's kubelet config
    root@worker-01:~# kubeadm upgrade node
    
  4. Upgrade kubectl and kubelet packages
  5. Restart kubelet
  6. Run puppet in case there's any config we have that isn't captured by kubeadm.
  7. Uncordon
    root@control-01:~# kubectl uncordon $NODE
    

NOTE mind the k8s API is behind the FQDN k8s.tools.eqiad1.wikimedia.cloud, some commands may vary their output/results depending on which backend is HAproxy reaching. This can be prevented by disabling backends by hand during the upgrade window.

automated process

Due to the potentially large number of worker nodes, there is a script to automate this process: wmcs-k8s-node-upgrade.py.

user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --help
usage: wmcs-k8s-node-upgrade.py [-h] --control CONTROL [--project PROJECT] [--domain DOMAIN] [--src-version SRC_VERSION] [--dst-version DST_VERSION] [-n NODE] [--file FILE] [-p] [-d] [--debug]

Utility to automate upgrading a k8s node in our kubeadm-based deployments

optional arguments:
  -h, --help            show this help message and exit
  --control CONTROL     The hostname of the control plane node to use. Typical value are something like 'tools-k8s-control-1' or 'toolsbeta-test-k8s-control-1'. The FQDN will be built using the project and
                        domain argument
  --project PROJECT     The CloudVPS project name. Typical values are: 'tools', 'toolsbeta' or 'paws'. This will be used to build FQDNs. Defaults to 'toolsbeta'
  --domain DOMAIN       The CloudVPS domain for building FQDNs. Typical values are: 'eqiad1.wikimedia.cloud' or 'eqiad.wmflabs'. Defaults to 'eqiad1.wikimedia.cloud'
  --src-version SRC_VERSION
                        Source/original kubernetes version. Defaults to '1.16.9'
  --dst-version DST_VERSION
                        Destination/target kubernetes version. Defaults to '1.17.13'
  -n NODE, --node NODE  Hostname of target node to upgrade. Can be specified multiple times for multiple nodes in the same script run. Can be combined with the '--file' option. The FQDN will be built using the
                        project and domain argument. Example: -n tools-k8s-worker-1 -ntools-k8s-worker-2
  --file FILE           File with a list of target nodes to upgrade. The file should contain a target hostname per line. The behavior is the same as in the '--node' option, and can be combined
  -p, --no-pause        If this option is present, this script won't prompt for a confirmation between each node upgrade
  -d, --dry-run         Dry run: only show what this script would do, but don't do it for real
  --debug               To active debug mode

Typical usage is to generate a file with a list of worker nodes, each in a line:

toolsbeta-test-k8s-worker-1
toolsbeta-test-k8s-worker-2
toolsbeta-test-k8s-worker-3

And give it as input to the script, along with other required arguments:

user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --src-version 1.16 --dst-version 1.17 --file nodelist.txt
[..]

It is recommended to run it in dry mode first to double-check what the script will do:

user@laptop:~/git/wmf/operations/puppet$ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-1 --project tools --domain eqiad1.wikimedia.cloud -n tools-k8s-worker-1 -n tools-k8s-worker-2 --src-version 1.16 --dst-version 1.17 -p --dry-run
[wmcs-k8s-node-upgrade.py] INFO: stage: generating node list
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: stage: prechecks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl drain --force --ignore-daemonsets --delete-local-data tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo kubeadm upgrade node --certificate-renewal=true"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker-ce containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y --allow-downgrades"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-1 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-1
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl uncordon tools-k8s-worker-1"
[wmcs-k8s-node-upgrade.py] INFO: stage: refreshing node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo puppet agent --enable && sudo run-puppet-agent && sudo apt-get update"
[wmcs-k8s-node-upgrade.py] INFO: stage: prechecks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubeadm | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubectl | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Installed"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "apt-cache policy kubelet | grep Candidate"
[wmcs-k8s-node-upgrade.py] INFO: stage: drain for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl drain --force --ignore-daemonsets --delete-local-data tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: stage: upgrade for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubeadm -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo kubeadm upgrade node --certificate-renewal=true"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo DEBIAN_FRONTEND=noninteractive apt-get install kubectl kubelet docker-ce containerd.io -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" -y --allow-downgrades"
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud "sudo systemctl restart docker.service kubelet.service"
[wmcs-k8s-node-upgrade.py] INFO: stage: postchecks for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl get node tools-k8s-worker-2 -o yaml"
[wmcs-k8s-node-upgrade.py] INFO: stage: uncordon for node tools-k8s-worker-2
[wmcs-k8s-node-upgrade.py] INFO: DRY: ssh -oStrictHostKeyChecking=no tools-k8s-control-1.tools.eqiad1.wikimedia.cloud "sudo -i kubectl uncordon tools-k8s-worker-2"
[wmcs-k8s-node-upgrade.py] INFO: nothing else to do

Ingress nodes

The ingress nodes are similar to the worker nodes but they need some special treatment:

  • With admin credentials on kubernetes, run kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2 to prevent an ingress controller from being scheduled on a regular node.
  • Ingress pods take a while to evict. It should be safe to upgrade the ingress nodes in parallel with the normal worker nodes.
  • When done kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3 to return the cluster to normal operation.

finishing touches

sudo cumin 'O{project:<project name> name:-k8s-}' 'puppet agent --enable'

if toolforge upgrade: upgrade kubectl on bastions

  • tools-sgebastion-07.tools.eqiad1.wikimedia.cloud / login.toolforge.org
  • tools-sgebastion-08.tools.eqiad1.wikimedia.cloud / dev.toolforge.org
  • tools-sgebastion-10.tools.eqiad1.wikimedia.cloud / login-buster.toolforge.org
  • tools-sgebastion-11.tools.eqiad1.wikimedia.cloud / dev-buster.toolforge.org

revert topic changes on -cloud