You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Kubernetes: Difference between revisions
imported>Elukey |
imported>Elukey |
||
Line 46: | Line 46: | ||
**The magic command above will show a diff related to the new service, make sure that everything looks fine and then hit Yes to proceed. | **The magic command above will show a diff related to the new service, make sure that everything looks fine and then hit Yes to proceed. | ||
**You should now be able to test your new service in staging! You can use the handy endpoint <code>http(s)://staging.svc.eqiad.wmnet:$YOUR-SERVICE-PORT</code> to quickly test if everything works as expected. | **You should now be able to test your new service in staging! You can use the handy endpoint <code>http(s)://staging.svc.eqiad.wmnet:$YOUR-SERVICE-PORT</code> to quickly test if everything works as expected. | ||
*Create certificates for the new service | *Now you can move to Production! | ||
*Create certificates for the new service, if it has an HTTPS endpoint (remember that this step for staging is automatically handled for you, but for production it is not). | |||
**[[Enable TLS for Kubernetes deployments]] | **[[Enable TLS for Kubernetes deployments]] | ||
*If the new service requires specific secrets, commit them to <code>/srv/private/hieradata/role/common/deployment_server.yaml</code> | *If the new service requires specific secrets, commit them to <code>/srv/private/hieradata/role/common/deployment_server.yaml</code> |
Revision as of 15:56, 25 January 2021
- For information about Kubernetes in the Toolforge environment see Help:Toolforge/Kubernetes.
Kubernetes (often abbreviated k8s) is an open-source system for automating deployment, and management of applications running in containers. This page collects some notes/docs on the Kubernetes setup in the Foundation production environment.
Packages
We deploy kubernetes in WMF production using Debian packages where appropriate. There is an upgrade policy in place for defining the timeframe and versions we run at every point in time. It's under Kubernetes/Kubernetes_Infrastructure_upgrade_policy. For more technical information on how we build the Debian packages have a look at Kubernetes/Packages
Images
For how our images are built and maintained have a look at Kubernetes/Images
Services
A service in Kubernetes is an 'abstract way to expose an application running on a set of workloads as a network service'.
- Learn more about Migrating a service to kubernetes and Deploying a service in kubernetes.
Debugging
For a quick intro into the debugging actions one can take during a problem in production look at Kubernetes/Helm. There will also be a guide posted under Kubernetes/Kubectl
Administration
Add a new service
To add a new service to the clusters:
- Ensure the service has it's ports registered at: Service ports
- Create deployment user/tokens in the puppet private (you can use a random generated password, no strict guideline for it) and public repos.
- Example 1
- https://gerrit.wikimedia.org/r/c/labs/private/+/613101 (plus actual data in the private repo, see
1edf14c0
) - https://gerrit.wikimedia.org/r/c/operations/puppet/+/613104
- https://gerrit.wikimedia.org/r/c/labs/private/+/613101 (plus actual data in the private repo, see
- Example 2 - eventstreams-internal (T269160)
- https://gerrit.wikimedia.org/r/655879 (plus the actual data in the private repo, see
6689496a
and376c92ad
) - https://gerrit.wikimedia.org/r/c/operations/puppet/+/656129
- https://gerrit.wikimedia.org/r/655879 (plus the actual data in the private repo, see
- Example 1
- Add a Kubernetes namespace:
- Example 1
- https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/645376 (ignore the change to calico/default-kubernetes-policy)
- Example 2 - eventstreams-internal (T269160)
- Example 1
- At this point, you can safely merge the change (after somebody from Service Ops validates it of course). Please do it though when you have time to run the following command, to avoid impacting other people rolling out changes later on.
- The first thing to do is to work in staging, updating the admin config.
- On deploy1001:
sudo -i; cd /srv/deployment-charts/helmfile.d/admin/staging/; kube_env admin staging; ./cluster-helmfile.sh -i apply
- The command above should show you a change in namespaces/quotas/etc.. related to your new service. If this is not the case (for example, you also see other changes) ping somebody from the Service Ops team! There might be some work waiting to be applied.
- On deploy1001:
- Then you can proceed to deploy the new service to staging for real. Don't worry for TLS (if needed) since in staging it will be added a default config for your service auto-magically. Different thing is Production, but there is a step later on about it :D
- On deploy1001:
cd /srv/deployment-charts/helmfile.d/services/YOUR-SERVICE-NAME-HERE; helmfile -e staging -i apply
- The magic command above will show a diff related to the new service, make sure that everything looks fine and then hit Yes to proceed.
- You should now be able to test your new service in staging! You can use the handy endpoint
http(s)://staging.svc.eqiad.wmnet:$YOUR-SERVICE-PORT
to quickly test if everything works as expected.
- On deploy1001:
- Now you can move to Production!
- Create certificates for the new service, if it has an HTTPS endpoint (remember that this step for staging is automatically handled for you, but for production it is not).
- If the new service requires specific secrets, commit them to
/srv/private/hieradata/role/common/deployment_server.yaml
The service can now be deployed using helmfile
and cat be accessed via the registered port on any of the kubernetes nodes (for manual testing).
If you need the service to be easily accessible from outside of the cluster, you might want to add Add a new load balanced service.
Rebooting a worker node
The unpolite way
To reboot a worker node, you can just reboot it in our environment. The platform will understand the event and respawn the pods on other nodes. However the system does not automatically rebalance itself currently (pods are not rescheduled on the node after it has been rebooted)
The polite way (recommended)
If you feel like being more polite, use kubectl drain, it will configure the worker node to no longer create new pods and move the existing pods to other workers. Draining the node will take time. Rough numbers on 2019-12-11 are at around 60 seconds.
# kubectl drain kubernetes1001.eqiad.wmnet
# kubectl describe pods --all-namespaces | awk '$1=="Node:" {print $NF}' | sort -u
kubernetes1002.eqiad.wmnet/10.64.16.75
kubernetes1003.eqiad.wmnet/10.64.32.23
kubernetes1004.eqiad.wmnet/10.64.48.52
kubernetes1005.eqiad.wmnet/10.64.0.145
kubernetes1006.eqiad.wmnet/10.64.32.18
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes1001.eqiad.wmnet Ready,SchedulingDisabled <none> 2y352d v1.12.9
kubernetes1002.eqiad.wmnet Ready <none> 2y352d v1.12.9
kubernetes1003.eqiad.wmnet Ready <none> 2y352d v1.12.9
kubernetes1004.eqiad.wmnet Ready <none> 559d v1.12.9
kubernetes1005.eqiad.wmnet Ready <none> 231d v1.12.9
kubernetes1006.eqiad.wmnet Ready <none> 231d v1.12.9
When the node has been rebooted, it can be configured to reaccept pods using kubectl uncordon, e.g.
# kubectl uncordon kubernetes1001.eqiad.wmnet
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes1001.eqiad.wmnet Ready <none> 2y352d v1.12.9
kubernetes1002.eqiad.wmnet Ready <none> 2y352d v1.12.9
kubernetes1003.eqiad.wmnet Ready <none> 2y352d v1.12.9
kubernetes1004.eqiad.wmnet Ready <none> 559d v1.12.9
kubernetes1005.eqiad.wmnet Ready <none> 231d v1.12.9
kubernetes1006.eqiad.wmnet Ready <none> 231d v1.12.9
The pods are not rebalanced automatically, i.e. the rebooted node is free of pods initially.
Restarting calico-node
calico-node maintains a BGP session with the core routers if you intend to restart this service you should use the following procedure
- drain the node on the kube controler as shown above
systemctl restart calico-node
on the kube worker- Wait for BGP sessions on the coure router to re-established
- uncordon the node on the kube controler as shown above
you can use the following command on the cour routers to check BGP status (use match 64602
for codfw)
# show bgp summary | match 64601
10.64.0.121 64601 220 208 0 2 32:13 Establ
10.64.0.145 64601 824512 795240 0 1 12w1d 21:45:51 Establ
10.64.16.75 64601 161 152 0 2 23:25 Establ
10.64.32.18 64601 824596 795247 0 2 12w1d 21:46:45 Establ
10.64.32.23 64601 130 123 0 2 18:59 Establ
10.64.48.52 64601 782006 754152 0 3 11w4d 11:13:52 Establ
2620:0:861:101:10:64:0:121 64601 217 208 0 2 32:12 Establ
2620:0:861:101:10:64:0:145 64601 824472 795240 0 1 12w1d 21:45:51 Establ
2620:0:861:102:10:64:16:75 64601 160 152 0 2 23:25 Establ
2620:0:861:103:10:64:32:18 64601 824527 795246 0 1 12w1d 21:46:45 Establ
2620:0:861:103:10:64:32:23 64601 130 123 0 2 18:59 Establ
2620:0:861:107:10:64:48:52 64601 782077 754154 0 2 11w4d 11:14:13 Establ
Restarting specific components
kube-controller-manager and kube-scheduler are components of the API server. In production multiple ones run and perform via the API an election to determine which one is the master. Restarting both is without grave consequences so it's safe to do. However both are critical components in as such that there are required for the overall cluster to function smoothly. kube-scheduler is crucial for node failovers, pod evictions, etc while kube-controller-manager packs multiple controller components and is critical for responding to pod failures, depools etc.
commands would be
sudo systemctl restart kube-controller-manager
sudo systemctl restart kube-scheduler
Restarting the API server
It's behind LVS in production, it's fine to restart it as long as enough time is given between the restarts across the cluster.
sudo systemctl restart kube-apiserver
Reinitialize a complete cluster
If, for whatever reason, we need to reinitialize a kubernetes cluster on a new etcd backing store. The following steps could be used as guideline. They might also help in understanding how the clusters are set up and how to set up new ones.
- Create puppet change, pointing k8s (and calico) to the new etcd cluster, see:
- Populate IPPool and BGP nodes in the new calico etcd backend
- On a random node of the kubernetes cluster:
cp /etc/calico/calicoctl.cfg . # Modify the etcdEndpoints config in ./calicoctl.cfg to point to new etcd # Set asNumber (64601 for eqiad, 64603 for codfw) calicoctl config set asNumber 6460X --config=calicoctl.cfg calicoctl config set nodeToNodeMesh off --config=calicoctl.cfg # FIXME: This assumes we still have access to the old etcd to read bgppeer # and ippool data from. calicoctl get -o yaml bgppeer | calicoctl create -f - --config=calicoctl.cfg calicoctl get -o yaml ippool | calicoctl create -f - --config=calicoctl.cfg # Create a basic default profile for the kube-system namespace in order to # allow kube-system tiller to talk to the kubernetes API to deploy the # calico-policy-controller (avoid catch-22). # # When the calico-policy-controller is started, it will sync things and this # simple profile will be updated and set up correctly. calicoctl create -f - --config=calicoctl.cfg <<_EOF_ - apiVersion: v1 kind: profile metadata: name: k8s_ns.kube-system tags: - k8s_ns.kube-system spec: egress: - action: allow destination: {} source: {} ingress: - action: allow destination: {} source: {} _EOF_
- On a random node of the kubernetes cluster:
- Schedule downtime for
- services running on the cluster
- kubernetes nodes and master
sudo cookbook sre.hosts.downtime -r 'Reinitialize eqiad k8s cluster with new etcd' -t TXXX -H 4 'A:eqiad and (A:kubernetes-masters or A:kubernetes-workers)'
- Depool services from discovery/edge caches
- Delete all helmfile managed namespaces (to be sure we see errors/missing things early)
- Disable puppet on master and k8s nodes
sudo cumin 'A:eqiad and (A:kubernetes-masters or A:kubernetes-workers)' "disable-puppet 'Reinitialize eqiad k8s cluster with new etcd - TXXXX'"
- Stop apiserver and calico node on k8s nodes
- Merge puppet changes
- Enable and run puppet on the k8s nodes
- Enable puppet on 1 apiserver and run it
- Disable puppet on apiserver again
- Edit
/etc/default/kube-apiserver
to disable PodSecurityPolicy controller - Start API server (running without PodSecurityPolicy controller now)
- Run
deployment-chars/helmfile.d/admin/initialize_cluster.sh
for the cluster - Restart kubelet on all kubernetes nodes
sudo cumin 'A:eqiad and A:kubernetes-workers' 'systemctl restart kubelet'
- Enable puppet on kubernetes masters again and run it. This will restart API server with PodSecurityPolicy controller
- Run
helmfile.d/admin/eqiad/cluster-helmfile.sh
- Deploy all services via a for loop and helmfile sync commands