You are browsing a read-only backup copy of Wikitech. The live site can be found at

Portal:Toolforge/Admin/Kubernetes/2020 Kubernetes cluster rebuild plan notes

From Wikitech-static
Jump to navigation Jump to search

This page holds notes and ideas regarding the Stretch Buster migration of Toolforge, specially related to k8s. See News/2020 Kubernetes cluster migration for how this all ended up!


Meeting attendats: Andrew, Chase, Arturo.


  • refactor toolforge puppet code (Brooke already started)
  • get puppet compiler for VMs so we can actually test puppet code
  • k8s: allocate a couple of weeks to play with callbacks and kubeadmin and evaluate if they are the way to go.

Thinks to take into account

  • probably going directly to Stretch is the way to go.
    • By the time we end, Buster may be stable already
    • (and Jessie old-old-stable)
  • k8s: jump versions when moving to eqiad1?
    • 1.4 --> 1.12
    • does the new version works for custom ingress controllers?
    • does kubdeadm works for us?
    • integration with nova-proxy?
    • try in a cloudvps project, in a pre-defined time slot
  • ingress controller for k8s
    • currently using kubeproxy, nginx
    • it worth migrate to some native k8s ingress controller
  • co-existence of k8s clusters
  • gridengine
    • write puppte code from scratch, a new deployment side-by-side with the old
    • both grids co-exists and users start launching things in the new one
    • a script or something to make sure nothing from the same tool is runnin gin the old grid


  • currently using vxlan overlay. We can probably carry over the same model.


Involved people: mostly Brooke and Arturo.

All puppet code is being developed for Debian Stretch. No kubernetes support for Debian Buster by this time (not even in production).

Related phabricator tasks used to track development:


The puppet code for k8s etcd was refactored/reworked into role::wmcs::toolforge::k8s::etcd. It uses the base etcd classes, shared with production.

The setup is intended to be a 3 nodes-cluster, which requires this hiera config (example):

profile::etcd::cluster_bootstrap: true
- toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-master-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-master-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-master-3.toolsbeta.eqiad.wmflabs
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo

This setup uses TLS with puppet certs.

k8s master

The puppet code for k8s masters was refactored/reworked into role::wmcs::toolforge::k8s::master. It uses the base kubernetes puppet modules, shared with production.

The setup is intented to be a 3 nodes-cluster, which requires this hiera config (example):

- toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo

Each master node run 3 important systemd services:

  • kube-apiserver
  • kube-controller-manager
  • kube-scheduler

They use TLS by means of puppet certs.

k8s worker nodes

Ongoing work, the puppet role is role::wmcs::toolforge::k8s::node.

k8s API proxy

We plan to use a 3 nodes-cluster to provide an HA proxy for the kubernetes API itself.

This is an ongoing work, the puppet role is role::wmcs::toolforge::k8s::apilb.



  • Brooke
  • Arturo
  • Jason


  • time investment?
    • yuvi wanted kubeadm
  • current situation with node authentication against the api-server
  • kubeadm vs raw puppet systemd-based services
  • PKI stuff

next steps

  • RBAC database missing in etcd? Which daemon should create this?
  • Kubernetes api-server bootstrap

  • Let's try kubeadm for the next couple of week! yaml configuration file in puppet.git
    • kubeadm package: a component in our repository
    • we may want to use kubeadm 1.15 directly
    • puppet tree:
      • use yaml configuration file for kubeadm, stored in puppet.git
      • use modules/toolforge/ for when it makes sense (kubeadm repo, etc?)
      • use profile::toolforge::k8s::kubeadm::{master,node,etc} for the other components
    • use external etcd bootstrapped by kubeadm


Involved people: mostly Brooke and Arturo.

We started trying with kubeadm for the cluster deployment. New puppet code was introduce for the basics, such as package installation and initial configuration file distributions.
We were able to deploy a basic cluster, which we can now as the starting point to actually beging building the Toolforge k8s service on top of it.


By the time of this writting, we have 3 kinds of servers involved
Each one has a diferent puppet role and a different hiera config, which is left here as example. You should refer to the puppet tree as the source of truth:

  • API LB: role::wmcs::toolforge::k8s::apilb
  • Requires a DNS name: pointing to this VM instance.
  • Hiera config:
    fqdn: toolsbeta-test-k8s-master-1.toolsbeta.eqiad.wmflabs
    port: 6443
    fqdn: toolsbeta-test-k8s-master-2.toolsbeta.eqiad.wmflabs
    port: 6443
    fqdn: toolsbeta-test-k8s-master-3.toolsbeta.eqiad.wmflabs
    port: 6443
  • Master: role::wmcs::toolforge::k8s::kubeadm::master
  • Hiera config:
profile::toolforge::k8s::dns_domain: toolsbeta.eqiad.wmflabs
profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw
sudo_flavor: sudo
swap_partition: false
profile::ldap::client::labs::client_stack: sssd
  • Worker: role::wmcs::toolforge::k8s::kubeadm::node
  • Hiera config:
profile::ldap::client::labs::client_stack: sssd
profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw
sudo_flavor: sudo
swap_partition: false

Random snippit found: To quickly obtain the --discover-token-ca-cert-hash argument, on an existing control plane node, run

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

The output of that should be used like,

kubeadm join --token $bootstrap_token --control-plane --discovery-token-ca-cert-hash sha256:$output_from_above_command --certificate-key $key


We reach a point in which we are confident with the cluster lifecycle.

The following components are used:

  • an external load balancer for the k8s API (role::wmcs::toolforge::k8s::apilb) (no additional setup but hiera config)
  • an external etcd server for k8s (3 nodes) (role::wmcs::toolforge::k8s::etcd) (no additional setup but hiera config. Buster already)
  • control plane nodes (role::wmcs::toolforge::k8s::kubeadm::master) (requires hiera config)
  • worker nodes (role::wmcs::toolforge::k8s::kubeadm::node)

In the first control plane node:

root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml

For additional control plane nodes:

root@toolsbeta-test-k8s-master-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
root@toolsbeta-test-k8s-master-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

root@toolsbeta-test-k8s-master-2:~# kubeadm join --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output> --control-plane --certificate-key <upload_certs_output>

For worker nodes:

aborrero@toolsbeta-test-k8s-worker-1:~ $ sudo kubeadm join --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output>

Note that:

  • deleting a node requires kubectl delete node <nodename (case of VM deletion), adding a node requires the steps outlined above.
  • we use puppet certs for the etcd client connection
  • we enforce client certs on etcd server side

Interesting commands for etcd:

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem del "" --from-key=true

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem  get / --prefix --keys-only | wc -l

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem member add toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs --peer-urls="https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2380"
Member bf6c18ddf5414879 added to cluster a883bf14478abd33


aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key-file /var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert-file /var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem cluster-health
member 67a7255628c1f89f is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379
member bf6c18ddf5414879 is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379
cluster is healthy


To load nginx-ingress:

root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress-psp.yaml created

root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress.yaml 
namespace/nginx-ingress unchanged
serviceaccount/nginx-ingress unchanged
configmap/nginx-config configured unchanged unchanged
secret/default-server-secret created
deployment.apps/nginx-ingress created

See phab:T228500 for more details.


Working prototype of maintain-kubeusers is here: phab:T228499 The general design of certs looks like the following.

Toolforge K8s PKI

The x.509 certs only allow authn. Authz is managed via RBAC and PSPs (design for which is in progress).


Docs for PSP and RBAC notions: Portal:Toolforge/Admin/Kubernetes/RBAC and PSP Going to add a bit more to that.


k8s discussion on several open questions.


  • Brooke
  • Jason
  • Arturo
  • Hieu
  • Bryan


  • Toolforge ingress: decide on final layout of north-south proxy setup
    • diagrams:
    • Bryan: having only for k8s is not very good (i.e, we would rather provide it for the webgrid too)
    • Bryan: the new domain name should not be available only for the new k8s
    • Arturo: how a given proxy knows if a $ tool is running in the grid, the legacy k8s or the new k8s?
    • option 4 is discarded: it would be difficult to introduce to the old grid. Complex SSL handling.
    • Hieu: option 3 does not require ingress at all. Brooke: but we want it so our cluster supports more use cases
    • Bryan: what about rate-limiting, etc? Hieu: rate-limiting using annotations:
    • option 1 is discarded: we feel options 2 and 3 supersedded it
    • Arturo proposal: let's follow approach of option 3. If we find a blocker, then follow option 2.
      • Add a fallthrough route in dynamicproxy to redirect to the new k8s cluster. In the first iteration, dynamicproxy knows nothing of
      • SSL: add SAN for the certificate that includes
      • first iteration: don't introduce the new domain yet?
  • Proposal: use option #3 (dynamicproxy -> { legacy things || new k8s ingress })
    • Try to introduce the new domain just for a cuple of weeks. If after a couple of weeks we aren't able, then move on. In such case, introducing the new domain will be a future Q goal or whatever.
  • Toolforge. introduce new domain
    • how, when, etc
    • Bryan: the new domain name should not be available only for the new k8s
    • Jason: What about not using * wildcard certifiacte. Using let's encrypt / acme-chief we could afford having a certificate per tool.
      • Jason: single domain certs per container could potentially offer better security
      • Bryan: ~600 single domains could be hard to manage (+1 Jason)
  • Toolforge ingress: create a default landing page for unknown/default URLs
    • we need a "default route" in the new ingress setup. I'm sure Bryan has some ideas about what to do with this.
  • There is a new upstream k8s release 1.16. We are developing in 1.15. Shall we upgrade before moving forward?
    • API changes may slow us down.
    • Brooke thinks strongly no and favors 15.2+ and that series for the first deploy. They changed a number of important API objects.
  • Deciding on a deadline for firsts testing tools (openstack-browser?)
    • rebuilding the current toolsbeta-test cluster just for sanity when all the moving parts are decided+1