You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Help:Toolforge/Raw kubernetes jobs

From Wikitech-static
< Help:Toolforge
Revision as of 00:57, 15 February 2022 by imported>BryanDavis (BryanDavis moved page Help:Toolforge/raw kubernetes jobs to Help:Toolforge/Raw kubernetes jobs: title case)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page contains information on running raw Kubernetes jobs in Toolforge.

In this context, raw means direct interaction with the Kubernetes API.

Note, however, that this is an alternative procedure. The recommendation is to use the Toolforge jobs framework.

single jobs

If you need to run a job only once you can use a pod, that is the smallest deployable unit in kubernetes. To deploy a pod you need to create a yaml file like the example below.

apiVersion: v1
kind: Pod
metadata:
  name : example
  labels:
     toolforge: tool
spec:
  containers:
  - name: main
    workingDir: /data/project/mytool
    image: docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
    command: ['/bin/bash', '-c', 'source venv3/bin/activate; ./myapp.py']
  restartPolicy: Never

Change the name "example" to the name you want to your pod, change the workingDir to the directory where your application is, change the image to the image you need, change the command to call your app and save the yaml file. You can create the pod with the command kubectl apply -f <path-to-yaml-file>.

You can see if the pod is running with kubectl get pods and see the pod output with kubectl logs <pod-name>. Note that it can not have two pods with the same name, you need to delete the old pod with kubectl delete pod <pod-name> before create a new one with the same name.

You can change the "restartPolicy: Never" to "restartPolicy: OnFailure" to make the pod restart the container when it exit with an error. However, if you want a continuous job it is recommended to use a "deployment" workload type as describe in a section below, because when the Kubernetes node where the pod is running has some failure the the deployment will recreate the pod in another node, what not happens when you create a simple pod.

cronjobs

It is possible to run cron jobs on Kubernetes (see upstream documentation for a full description).

Example cronjob.yaml

Wikiloveslove is a Python 3.7 bot that runs in a Kubernetes deployment. The cronjobs.yaml file that it uses to tell Kubernetes how to start and schedule the bot is reproduced below.

Create the CronJob object in your tool's Kubernetes namespace using kubectl:


$ kubectl apply --validate=true -f $HOME/cronjobs.yaml
cronjob.batch/CRONJOB-NAME configured

After creating the cronjob you can create a test job with kubectl create job --from=cronjob/CRONJOB-NAME test to immediately trigger the cronjob and then access the logs as usual with kubectl logs job/test -f to debug.

If that doesn't give you any useful output, try kubectl describe job/test to see what's going on: it might be a misconfigured limit, for instance.

If you want the application not to restart on failure, change "restartPolicy: OnFailure" to "restartPolicy: Never" and add "backoffLimit: 0" in the jobTemplate spec (with same indentation as "template:").

continuous jobs

The basic unit of managing execution on a Kubernetes cluster is called a "deployment". Each deployment is described with a YAML configuration file which describes the container images to be started ("pods" in the Kubernetes terminology) and commands to be run inside them after the container is initialized. A deployment also specifies where the pods run and what external resources are connected to them. The upstream documentation is comprehensive.

Example deployment.yaml

Stashbot is a Python 3.7 irc bot that runs in a Kubernetes deployment. The deployment.yaml file that it uses to tell Kubernetes how to start the bot is reproduced below. This deployment is launched using a stashbot.sh wrapper script which runs kubectl create --validate=true -f /data/project/stashbot/etc/deployment.yaml.

This deployment:

  • Uses the 'tool-stashbot' namespace that the tool is authorized to control
  • Creates a container using the 'latest' version of the 'docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base' Docker image.
  • Runs the command /data/project/stashbot/bin/stashbot.sh run inside the container to start the bot itself.
  • Mounts the /data/project/stashbot/ NFS directory as /data/project/stashbot/ inside the container.

Monitoring your jobs

You can see which jobs you have running with kubectl get pods. Using the name of the pod, you can see the logs with kubectl logs <pod-name>.

To restart a failing pod, use kubectl delete <pod-name>. If you need to kill it entirely, find the deployment name with kubectl get deployment, and delete it with kubectl delete deployment <deployment-name>.