You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

PAWS/Tools/Admin/Chico's notes

From Wikitech-static
< PAWS‎ | Tools‎ | Admin
Jump to navigation Jump to search

Notes on setting up a PAWS staging env in toolsbeta VPS project (T188428)

Using https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ as docs

  • Doesn't seem like PAWS is properly puppetized
    • I see the apt-pinning defnitions, but not where tools-paws-master-01 installs needed packages (k8s, docker, etc)
  • Docker and k8s repos are defined for Xenial, though tools-paws-master-01 is stretch
    • Docker does have current stretch versions, k8s does not (left docker as strech, k8s as xenial)
  • Unsure about cgroup driver used by docker. Official docs says to place { "exec-opts": ["native.cgroupdriver=systemd"] } in /etc/docker/daemon.json since the prod version does not have that I'm ommiting it for now
  • swap needs to be turned off for docker, done manually
  • started the k8s cluster with flannel
    • kubeadm init --pod-network-cidr=10.244.0.0/16
  • Allow user chicocvenancio to use k8s
    • chicocvenancio@toolsbeta-paws-master-01:~$ mkdir -p $HOME/.kube
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • Docs say to set /proc/sys/net/bridge/bridge-nf-call-iptables to 1, it already was 1
  • get flannel pods (docs mentions v0.9.1 I used latest version)
  • exported Yuvi's git-crypt key
    • yuvipanda@tools-paws-master-01:~/paws$ sudo -E git-crypt export-key /tmp/paws-key
    • chicocvenancio@tools-paws-master-01:~$ sudo chown chicocvenancio /tmp/paws-key
    • chicocvenancio@tools-paws-master-01:~$ scp /tmp/paws-key toolsbeta-paws-master-01.toolsbeta:~/paws-key
  • Intalled git-crypt
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo apt-get install git-crypt
  • ran into Error: could not find tiller
    • fixed with `helm init`
  • Tiller won't start due to a lack of nodes
  • Create new node
    • Is there a way to adhere to naming convention when using instance count in horizon?
    • toolsbeta-paws-worker-1001
      • Since its not puppetized, going for manual again
    • Node joining brings up tiller
  • Once tiller is up we can run "sudo ./build.py deploy prod --install" to install PAWS
    • We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
      • Tiller RBAC
        • Done now by setting a non ideal permissive clusterrolebinding
          • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts
      • the hub_db pvc is not defined at all in the repo, found the definition in yuvi's .bash_history
        • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl -n prod apply -f /mnt/nfs/labstore-secondary-home/yuvipanda/paw-c/hub-pv.yaml
  • TODO: setup a new OAuth consumer for PAWS-beta
  • TODO: stop these annoying pre-puller deamonsets
    • This is actually not annoying and good once I pointed them to working docker repositories
  • Right now paws-beta uses a completely different way (WMCS-wise) for traffic ingress, I thought this simpler than copying the paws-proxy instance, in fact we can probably drop those instances (VPS cloud project) and improve production after some testing
    • I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
      • This does mean that if that node fails the site will proxy will fail, which is NOT ok for production
  • Differences between PAWS-beta and prod:
    • Already without the query-killer image
    • Deploy-hook image uses artful and not zesty
    • Per above, traffic ingress is different


Notes on K8S upgrade in PAWS-beta


cluster control plane

  • Upgrade kubeadm
  • Verify kubeadm version
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubeadm version
    • kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • root@toolsbeta-paws-master-01:~# kubeadm upgrade plan
  • kubeadm upgrade apply v1.9.4

Upgrade nodes

  • drain each node
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
    • NAME STATUS ROLES AGE VERSION
    • toolsbeta-paws-master-01 Ready master 13d v1.9.3
    • toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.3
    • toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.3
    • toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.3
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl drain toolsbeta-paws-master-01 --ignore-daemonsets
  • upgrade k8s:
    • sudo apt-get update
    • sudo apt-get install kubeadm=1.9.4-00 kubectl=1.9.4-00 kubelet=1.9.4-00
  • verify kubelet is running
    • systemctl status kubelet
  • Uncordon node and verify it is ready
    • kubectl uncordon toolsbeta-paws-master-01
    • kubectl get nodes
  • Rinse, repeat for each node until all nodes in version v1.9.4:
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
    • NAME STATUS ROLES AGE VERSION
    • toolsbeta-paws-master-01 Ready master 13d v1.9.4
    • toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.4
    • toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.4
    • toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.4

k8s upgrade in PAWS

  • Went through the same steps as beta, but there are more nodes, and ran into a few new issues
  • apt-pinning by puppet
    • This is done by raising a version's priority, to keep the workflow and guarantee the "drain => upgrade => uncordon => next node" order used "=1.9.4-00" to request specific version to apt-get install
  • nodes taking a very long time to drain
    • Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
      • get the pods on the node and delete the ones still there
        • kubectl get pods -o wide --all-namespaces|grep worker-1001
          • kube-system kube-flannel-ds-stbq6 1/1 Running 1 63d 10.68.23.135 tools-paws-worker-1001
          • kube-system kube-proxy-zzrj2 1/1 Running 0 3h 10.68.23.135 tools-paws-worker-1001
          • prod proxy-5cd7d56555-tm4p6 2/2 Running 0 20d 10.244.7.45 tools-paws-worker-1001
          • support support-nginx-ingress-controller-sbl64 1/1 Running 4 63d 10.68.23.135 tools-paws-worker-1001
        • Mind kube-flannel, support-nginx-ingress-controller, and kube-proxy are DeamonSet controlled, so they wouldn't interfere
          • chicocvenancio@tools-paws-master-01:~$ kubectl delete pod proxy-5cd7d56555-tm4p6 -n prod
  • Created bash script to go through each one (/home/chicocvenancio/update_nodes.sh)
  • Sent Change 419599 to pin k8s to version 1.9.4 in PAWS.