You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Portal:Toolforge/Admin/Kubernetes/Certificates: Difference between revisions
imported>Arturo Borrero Gonzalez (draft no longer!) |
imported>BryanDavis (tweak headings; exact command for admin user removal of configmap) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
This page contains information on '''certificates''' (PKI, X.509, etc) for the '''Toolforge Kubernetes''' cluster. | This page contains information on '''certificates''' (PKI, X.509, etc) for the '''Toolforge Kubernetes''' cluster. | ||
= General considerations = | == General considerations == | ||
[[File:Toolforge_K8s_PKI_Design_in_Simple_Form.png|right|thumb]] | [[File:Toolforge_K8s_PKI_Design_in_Simple_Form.png|right|thumb]] | ||
Line 15: | Line 15: | ||
Most certs can be checked for expiration with <code>sudo kubeadm alpha certs check-expiration</code> on a control plane node. | Most certs can be checked for expiration with <code>sudo kubeadm alpha certs check-expiration</code> on a control plane node. | ||
= | == External API access == | ||
We have certain entities contacting external the kubernetes API. The authorization/authentication access is managed using a kubernetes ServiceAccount and a x509 certificate. | We have certain entities contacting external the kubernetes API. The authorization/authentication access is managed using a kubernetes ServiceAccount and a x509 certificate. | ||
Line 28: | Line 24: | ||
* '''TODO:''' any other example? | * '''TODO:''' any other example? | ||
=== | === Operations === | ||
{{warning|disable puppet fleetwide to make this whole operation more atomic, and no puppet client see the private repo without content}} | |||
Certificates for this use case can be generated using a custom script we have: [[Portal:Toolforge/Admin/Maintenance#wmcs-k8s-get-cert | wmcs-k8s-get-cert ]]. | Certificates for this use case can be generated using a custom script we have: [[Portal:Toolforge/Admin/Maintenance#wmcs-k8s-get-cert | wmcs-k8s-get-cert ]]. | ||
Line 39: | Line 36: | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
user@tools- | user@tools-clushmaster-02:~$ clush WHATEVER_DISABLE_PUPPET_FEELWIDE # TODO | ||
user@tools-k8s-control-3:~$ sudo -i wmcs-k8s-get-cert prometheus | user@tools-k8s-control-3:~$ sudo -i wmcs-k8s-get-cert prometheus | ||
Line 65: | Line 62: | ||
create a patch similar to https://gerrit.wikimedia.org/r/#/c/601692/ | create a patch similar to https://gerrit.wikimedia.org/r/#/c/601692/ | ||
user@tools- | user@tools-clushmaster-02:~$ clush WHATEVER_ENABLE_PUPPET_FEELWIDE # TODO | ||
[...] | [...] | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== | == Internal API access == | ||
Some stuff running inside the kubernetes cluster also require a certificate to access the API server and use a ServiceAccount. | Some stuff running inside the kubernetes cluster also require a certificate to access the API server and use a ServiceAccount. | ||
Line 79: | Line 76: | ||
* the internal metrics server (i.e, what ''kubectl top'' uses) | * the internal metrics server (i.e, what ''kubectl top'' uses) | ||
=== | === Operations === | ||
Certificates for this use case can be generated using a custom script we have: [[Portal:Toolforge/Admin/Maintenance#wmcs-k8s-secret-for-cert | wmcs-k8s-secret-for-cert]]. | Certificates for this use case can be generated using a custom script we have: [[Portal:Toolforge/Admin/Maintenance#wmcs-k8s-secret-for-cert | wmcs-k8s-secret-for-cert]]. | ||
Line 109: | Line 106: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== | == Node/kubelet certs == | ||
Kubelet has two certs: | Kubelet has two certs: | ||
Line 117: | Line 114: | ||
At this time the serving certificate is a self-signed one managed by kubelet, which should not need manual rotation. Proper, CA-signed rotating certs are stabilizing as a feature set in Kubernetes 1.17, and we should probably switch to that for consistency and as a general improvement. The client cert of kubelet is signed by the cluster CA and expires in 1 year. | At this time the serving certificate is a self-signed one managed by kubelet, which should not need manual rotation. Proper, CA-signed rotating certs are stabilizing as a feature set in Kubernetes 1.17, and we should probably switch to that for consistency and as a general improvement. The client cert of kubelet is signed by the cluster CA and expires in 1 year. | ||
=== | === Operations === | ||
All such client certs are rotated when upgrading Kubernetes, but they can be manually rotated with kubeadm as well. This should be as easy as running <code>kubeadm alpha certs renew</code> on a control plane node as root. | All such client certs are rotated when upgrading Kubernetes, but they can be manually rotated with kubeadm as well. This should be as easy as running <code>kubeadm alpha certs renew</code> on a control plane node as root. | ||
Line 124: | Line 121: | ||
'''TODO:''' elaborate | '''TODO:''' elaborate | ||
== | == Tool certs == | ||
These certs are automatically generated by the [https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/maintain-kubeusers/ maintain-kubeusers] mechanism. When a new tool is created in Striker, the LDAP change is picked up by a polling loop in the maintain-kubeusers deployment, and the service will: | These certs are automatically generated by the [https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/maintain-kubeusers/ maintain-kubeusers] mechanism. When a new tool is created in Striker, the LDAP change is picked up by a polling loop in the maintain-kubeusers deployment, and the service will: | ||
Line 140: | Line 137: | ||
This service runs in Kubernetes in a specialized namespace just for it using a hand-made Docker image, as is documented in the README of the repo. The toolsbeta version runs the <code>maintain-kubeusers:beta</code> tag instead of the <code>:latest</code> tag to facilitate staging and testing live without hurting Toolforge proper. Deploying new code only requires deleting the currently-running pod after refreshing the required image tag. | This service runs in Kubernetes in a specialized namespace just for it using a hand-made Docker image, as is documented in the README of the repo. The toolsbeta version runs the <code>maintain-kubeusers:beta</code> tag instead of the <code>:latest</code> tag to facilitate staging and testing live without hurting Toolforge proper. Deploying new code only requires deleting the currently-running pod after refreshing the required image tag. | ||
=== | === Operations === | ||
If someone has a need to rotate their tool user certs for some reason, run: | If someone has a need to rotate their tool user certs for some reason, run: | ||
<syntaxhighlight lang=shell-session>user@bastion $ sudo | <syntaxhighlight lang=shell-session>user@bastion $ sudo become <tool-that-needs-help> | ||
tools.toolname:~$ kubectl delete cm maintain-kubeusers | tools.toolname:~$ kubectl delete cm maintain-kubeusers | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This will cause maintain-kubeusers to refresh their certs. | This will cause maintain-kubeusers to refresh their certs. | ||
If the certs | If the certs were already deleted, you will need to instead have a cluster admin run <code>kubectl delete cm maintain-kubeusers --namespace tool-$toolname --as-group=system:masters --as=admin</code> since the tool won't be able to authenticate. | ||
In case of a corrupt <code>.kube/config</code> file, the same trick applies except, that <code>maintain-kubeusers</code> will not read invalid YAML. Therefore, you will need to delete the tool's <code>.kube/config</code> and then as a cluster admin, run <code>kubectl delete cm maintain-kubeusers --namespace tool-$toolname</code>. | In case of a corrupt <code>.kube/config</code> file, the same trick applies except, that <code>maintain-kubeusers</code> will not read invalid YAML. Therefore, you will need to delete the tool's <code>.kube/config</code> and then as a cluster admin, run <code>kubectl delete cm maintain-kubeusers --namespace tool-$toolname --as-group=system:masters --as=admin</code>. Maintain-kubeusers will regenerate their credentials soon after. | ||
== | == Etcd certs == | ||
All etcd servers use puppetmaster issued certificates (puppet node certificates). The etcd service will only allow communication from clients presenting a certificate signed by the same CA. | All etcd servers use puppetmaster issued certificates (puppet node certificates). The etcd service will only allow communication from clients presenting a certificate signed by the same CA. | ||
Line 160: | Line 156: | ||
In the puppet profile controlling this, we have a mechanism to refresh the certificate and restart the etcd daemon if the puppet node certificate changes (it is reissued or whatever). | In the puppet profile controlling this, we have a mechanism to refresh the certificate and restart the etcd daemon if the puppet node certificate changes (it is reissued or whatever). | ||
= See also = | == See also == | ||
Some other interesting docs: | Some other interesting docs: |
Revision as of 21:39, 9 November 2021
This page contains information on certificates (PKI, X.509, etc) for the Toolforge Kubernetes cluster.
General considerations
Kubernetes includes an internal CA which is the main one we use for cluster operations.
By default, kubernetes issued certificates are valid for 1 year. After that period, they should be renewed.
The internal kubernetes CA, generated at deployment time by kubadm expires after 10 years. The current CA is good until Nov 3 14:13:50 2029 GMT
Worth noting that etcd servers don't use the kubernetes CA, but use the puppetmaster CA instead.
Most certs can be checked for expiration with sudo kubeadm alpha certs check-expiration
on a control plane node.
External API access
We have certain entities contacting external the kubernetes API. The authorization/authentication access is managed using a kubernetes ServiceAccount and a x509 certificate. The x509 certificate encodes the ServiceAccount name in the Subject field.
Some examples of this:
- tools-prometheus uses this external API access to scrape metrics.
- TODO: any other example?
Operations
![]() | disable puppet fleetwide to make this whole operation more atomic, and no puppet client see the private repo without content |
Certificates for this use case can be generated using a custom script we have: wmcs-k8s-get-cert .
Usually, the generated cert will be copy&pasted into the private puppet repo to be used as a secret in a puppet module or profile.
Renewing the certificate is just generating a new one and replacing the old one.
Example workflow for replacing tools-prometheus k8s certificate:
user@tools-clushmaster-02:~$ clush WHATEVER_DISABLE_PUPPET_FEELWIDE # TODO
user@tools-k8s-control-3:~$ sudo -i wmcs-k8s-get-cert prometheus
/tmp/tmp.9k9N7ksn6K/server-cert.pem
/tmp/tmp.9k9N7ksn6K/server-key.pem
user@tools-k8s-control-3:~$ sudo cat /tmp/tmp.9k9N7ksn6K/server-cert.pem
-----BEGIN CERTIFICATE-----
MIIDYTCCA[...]
-----END CERTIFICATE-----
user@tools-k8s-control-3:~$ sudo cat /tmp/tmp.9k9N7ksn6K/server-key.pem
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBA[...]
-----END RSA PRIVATE KEY-----
root@tools-puppetmaster-02:/var/lib/git/labs/private# stg uncommit -t a706eb28
uncommit the patch that modifies 'modules/secret/secrets/ssl/toolforge-k8s-prometheus.key'
root@tools-puppetmaster-02:/var/lib/git/labs/private# stg pop ; stg push
until you are in the right uncommited patch
root@tools-puppetmaster-02:/var/lib/git/labs/private# nano modules/secret/secrets/ssl/toolforge-k8s-prometheus.key ; stg refresh
copy paste here the private key
root@tools-puppetmaster-02:/var/lib/git/labs/private# stg push -a ; stg commit -a
you are done!
user@laptop:~/git/wmf/operations/puppet$ nano files/ssl/toolforge-k8s-prometheus.crt
create a patch similar to https://gerrit.wikimedia.org/r/#/c/601692/
user@tools-clushmaster-02:~$ clush WHATEVER_ENABLE_PUPPET_FEELWIDE # TODO
[...]
Internal API access
Some stuff running inside the kubernetes cluster also require a certificate to access the API server and use a ServiceAccount. This certificate is usually crafted as a Kubernetes secret for the utility to use it.
Some examples of this:
- our custom webhook: ingress admission controller
- our custom webhook: registry admission controller
- the internal metrics server (i.e, what kubectl top uses)
Operations
Certificates for this use case can be generated using a custom script we have: wmcs-k8s-secret-for-cert.
After running the script, the secret should be ready to use.
Renewing the certificate is just generating a new one (running the script again and making sure the pod uses it).
If you want to make sure the old cert is no longer present, just delete it and run the script again. Example session for the metrics-server:
root@tools-k8s-control-3:~# kubectl delete secrets -n metrics metrics-server-certs
secret "metrics-server-certs" deleted
root@tools-k8s-control-3:~# wmcs-k8s-secret-for-cert -n metrics -s metrics-server-certs -a metrics-server
secret/metrics-server-certs created
root@tools-k8s-control-3:~# kubectl get secrets -n metrics metrics-server-certs -o yaml | grep cert.pem | head -1 | awk -F' ' '{print $2}' | base64 -d | openssl x509 -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
2f:65:a6:cf:2c:16:2f:39:6e:29:95:ee:35:01:b9:d7:75:a1:d2:50
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kubernetes
Validity
Not Before: Jun 2 11:31:00 2020 GMT
Not After : Jun 2 11:31:00 2021 GMT
Subject: CN = metrics-server
[..]
Node/kubelet certs
Kubelet has two certs:
- A client cert to communicate with the API server
- A serving certificate for the Kubelet API
At this time the serving certificate is a self-signed one managed by kubelet, which should not need manual rotation. Proper, CA-signed rotating certs are stabilizing as a feature set in Kubernetes 1.17, and we should probably switch to that for consistency and as a general improvement. The client cert of kubelet is signed by the cluster CA and expires in 1 year.
Operations
All such client certs are rotated when upgrading Kubernetes, but they can be manually rotated with kubeadm as well. This should be as easy as running kubeadm alpha certs renew
on a control plane node as root.
It is possible to configure the kubelet to request upgraded certs on its own when they near expiration. So far, we have not set this flag in the config, expecting our upgrade cycle to be 6 months, roughly.
TODO: elaborate
Tool certs
These certs are automatically generated by the maintain-kubeusers mechanism. When a new tool is created in Striker, the LDAP change is picked up by a polling loop in the maintain-kubeusers deployment, and the service will:
- Create the NFS folder for the tool if it isn't already there because of maintain-dbusers
- Create the necessary folders to set up the KUBECONFIG for the user.
- Create a tool namespace along with all necessary privileges, restrictions and quotas
- Generate a private key
- Request and approve the CSR for the cert to authenticate the new tool with the Kubernetes cluster
- Write out the cert to the appropriate files along with the KUBECONFIG
- Create a configmap named
maintain-kubeusers
in the tool namespace that gives the expiration date of the cert to use for automatically regenerating the cert before it expires- Deleting this configmap will cause the cert to be regenerated on the next iteration. This is the safest way to regenerate the certs manually.
Each cert includes a CN, which functions as the user name in Kubernetes, and can include groups as well ("O:" or organization entries). Tool certs currently have the CN of their tool name and one O of "toolforge".
This service runs in Kubernetes in a specialized namespace just for it using a hand-made Docker image, as is documented in the README of the repo. The toolsbeta version runs the maintain-kubeusers:beta
tag instead of the :latest
tag to facilitate staging and testing live without hurting Toolforge proper. Deploying new code only requires deleting the currently-running pod after refreshing the required image tag.
Operations
If someone has a need to rotate their tool user certs for some reason, run:
user@bastion $ sudo become <tool-that-needs-help>
tools.toolname:~$ kubectl delete cm maintain-kubeusers
This will cause maintain-kubeusers to refresh their certs.
If the certs were already deleted, you will need to instead have a cluster admin run kubectl delete cm maintain-kubeusers --namespace tool-$toolname --as-group=system:masters --as=admin
since the tool won't be able to authenticate.
In case of a corrupt .kube/config
file, the same trick applies except, that maintain-kubeusers
will not read invalid YAML. Therefore, you will need to delete the tool's .kube/config
and then as a cluster admin, run kubectl delete cm maintain-kubeusers --namespace tool-$toolname --as-group=system:masters --as=admin
. Maintain-kubeusers will regenerate their credentials soon after.
Etcd certs
All etcd servers use puppetmaster issued certificates (puppet node certificates). The etcd service will only allow communication from clients presenting a certificate signed by the same CA. This means kubernetes components that contact etcd should use puppet node certificates.
In the puppet profile controlling this, we have a mechanism to refresh the certificate and restart the etcd daemon if the puppet node certificate changes (it is reissued or whatever).
See also
Some other interesting docs: