You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Kubernetes/Deployments

From Wikitech-static
Jump to navigation Jump to search

Deployments on kubernetes happen using helmfile.

Deploying with helmfile

Code deployment/configuration changes

Note that both new code deployments as well as configuration changes are considered a deployment!

  1. Clone deployment-charts repo.
  2. Using your editor modify under the helmfile.d folder of the service you want to modify. As an example, myservice deployment lives under deployment-charts/helmfile.d/services/myservice. Most of the changes are usually made on the values.yaml and the values-*.yaml files to tune the deployment parameters.
  3. If you need to update or add a secret like a password or a certificate ask an SRE to commit it into the private puppet repo do not commit secrets in deployment-charts repo.
  4. Make a CR and after a successful review merge it. Note: SRE may offer +1 to your patch and that is sufficient to self-merge and deploy (see the notes about deployment changes in https://www.mediawiki.org/wiki/Gerrit/Privilege_policy#Merging_without_review)
  5. After merge, log in to a deployment server, there is a cron (1 minute) that will update the /srv/deployment-charts directory with the contents from git.
  6. Go to /srv/deployment-charts/helmfile.d/services/${SERVICE} where SERVICE is the name of your service i.e myservice.
  7. execute helmfile -e ${CLUSTER} -i apply, where CLUSTER is the k8s cluster you're operating on - one of staging, eqiad and codfw. This will show the changes that it will be applied on the cluster and prompt you to confirm. Then it will materialize the previous diff in the cluster and also will log into SAL the change.
  8. all done!

In case there are multiple releases of your service in the same helmfile, you can use the --selector name=RELEASE_NAME option, e.g. helmfile -e $CLUSTER --selector name=test -i apply.

Seeing the current status

This is done using helmfile

  1. Change directory to /srv/deployment-charts/helmfile.d/services/${SERVICE} on a deployment server
  2. Unless you are mid un-applied changes the current values files should reflect the deployed values
  3. You can check for unapplied changes with: helmfile -e $CLUSTER diff
  4. You can see the status with helmfile -e $CLUSTER status

Rolling back changes

If you need to roll back a change because something went wrong:

  1. Revert the git commit to the deployment-charts repo
  2. Merge the revert (with review if needed)
  3. Wait one minute for the cron job to pull the change to the deployment server
  4. Change directory to /srv/deployment-charts/helmfile.d/services/${SERVICE} where SERVICE is the name of your service
  5. execute helmfile -e $CLUSTER diff where CLUSTER is one of (staging,eqiad,codfw) to see what you'll be changing
  6. execute helmfile -e $CLUSTER apply

Rolling back in an emergency

If you can't wait the one minute, or the cron job to update from git fails etc. then it is possible to manually roll back using helm. This is discouraged over using helmfile though.

  1. Find the revision to roll back to
    1. kube_env $SERVICE $CLUSTER; helm history production
    2. Find the revision to roll back to
    3. e.g. perhaps the penultimate one
      REVISION        UPDATED                         STATUS          CHART           DESCRIPTION     
      1               Tue Jun 18 08:39:20 2019        SUPERSEDED      termbox-0.0.2   Install complete
      2               Wed Jun 19 08:20:42 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      3               Wed Jun 19 10:33:34 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      4               Tue Jul  9 14:21:39 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      
  2. Rollback with: kube_env $SERVICE $CLUSTER; helm rollback production 3

Rolling restart

If you want to force all PODs of your deployment to restart, you can use the roll_restart parameter during deployment with helmfile:

helmfile -e $CLUSTER --state-values-set roll_restart=1 sync

Advanced use cases: using kubeconfig

If you need to use kubeconfig (for a port-forward or to get logs for debugging) you can execute kube_env $SERVICE $CLUSTER; kubectl COMMAND, e.g. kube_env myservice staging; kubectl logs POD_NAME -c CONTAINER_NAME for logs.

Advanced use cases: using helm

Sometimes you might need to use helm, this is completely discouraged use it only at your own risk and in emergencies. It assumes that you know what you are doing using helm.

  • kube_env <service> <cluster>
  • helm <command>

Example:

akosiaris@deploy1002:~$ kube_env mathoid eqiad
akosiaris@deploy1002:~$ helm list
NAME      	REVISION	UPDATED                 	STATUS  	CHART         	APP VERSION	NAMESPACE
production	1       	Tue Mar 23 10:37:50 2021	DEPLOYED	mathoid-0.0.35	           	mathoid   
akosiaris@deploy1002:~$ helm status
Error: release name is required
akosiaris@deploy1002:~$ helm status production
LAST DEPLOYED: Tue Mar 23 10:37:50 2021
NAMESPACE: mathoid
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME                                    DATA  AGE
config-production                       1     26d
mathoid-production-envoy-config-volume  1     26d
mathoid-production-tls-proxy-certs      2     26d
production-metrics-config               1     26d
==> v1/Deployment
NAME                READY  UP-TO-DATE  AVAILABLE  AGE
mathoid-production  30/30  30          30         26d
==> v1/NetworkPolicy
NAME                POD-SELECTOR                    AGE
mathoid-production  app=mathoid,release=production  26d 
==> v1/Pod(related)
NAME                                 READY  STATUS   RESTARTS  AGE
mathoid-production-64787b97c5-24pzw  3/3    Running  0         26d
...
mathoid-production-64787b97c5-z74n2  3/3    Running  0         26d
==> v1/Service
NAME                            TYPE      CLUSTER-IP    EXTERNAL-IP  PORT(S)          AGE
mathoid-production              NodePort  10.64.72.227  <none>       10044:10042/TCP  26d
mathoid-production-tls-service  NodePort  10.64.72.35   <none>       4001:4001/TCP    26d

See also