You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Toolforge/Admin/Kubernetes/jobs: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arturo Borrero Gonzalez
(refresh content)
 
imported>Arturo Borrero Gonzalez
(→‎See also: add link to end users documentation)
Line 5: Line 5:
== The framework ==
== The framework ==


The framework is called '''Toolforge Jobs Framework''' (or '''TJF'''). The main component is a REST API to ease end user interaction with Toolforge jobs in the kubernetes cluster. The API abstracts away most of the k8s gory details for configuring, removing, managing and reading status on jobs. The abstraction approach is similar to what is being done with [[Help:Toolforge/Web | Toolforge webservices]] (we have the <code>webservice</code> command there), but with an approach that consist on decoupling the software into 2 components: an API service and a command line interface.
The framework is called '''Toolforge Jobs Framework''' (or '''TJF'''). The main component is a REST API to ease end user interaction with Toolforge jobs in the kubernetes cluster. The API abstracts away most of the k8s gory details for configuring, removing, managing and reading status on jobs. The abstraction approach is similar to what is being done with [[Help:Toolforge/Web | Toolforge webservices]] (we have the <code>webservice</code> command there), but with an approach that consist on having most of the business logic in an API service.


The framework consists on 3 components:
By splitting the software into several components, and introducing an stable API, we aim to reduce maintenance burden by not needing to rebuild all Toolforge docker containers every time we change some internal mechanism (which is the case of the <code>tools-webservice</code> package).
* jobs-framework-api (REST API)
* jobs-framework-cli (python command)
* jobs-framework-emailer (python daemon to send email notifications)
 
The API is freely usable within Toolforge, both bastion servers and kubernetes pods. This means that a running job can interact with the Toolforge jobs API and CRUD other jobs.
 
=== The components ===


[[File:Toolforge_jobs.png|center|500px]]
[[File:Toolforge_jobs.png|center|500px]]


The '''TJF''' is composed of 3 components:
The framework consists on 3 components:
* <code>jobs-framework-api</code> --- runs inside the k8s cluster as a webservice. Offers the REST API that in turn interacts with the k8s API native objects: <code>CronJob</code>, <code>Job</code> and <code>Deployment</code>.
* '''jobs-framework-api''' ([https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-api gerrit]) ([https://gerrit.wikimedia.org/g/cloud/toolforge/jobs-framework-api gitiles]) --- uses [https://flask-restful.readthedocs.io flask-restful] and runs inside the k8s cluster as a webservice. Offers the REST API that in turn interacts with the k8s API native objects: <code>CronJob</code>, <code>Job</code> and <code>Deployment</code>.
* <code>jobs-framework-cli</code> --- command line interface to interact with the jobs API service. Typically used by end users in Toolforge bastions.
* '''jobs-framework-cli''' ([https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-cli gerrit]) ([https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/toolforge/jobs-framework-api/ gitiles]) --- command line interface to interact with the jobs API service. Typically used by end users in Toolforge bastions.
* <code>jobs-framework-emailer</code> -- a daemon that runs inside k8s, listen to pod events, and email users about their jobs activity.
* '''jobs-framework-emailer''' ([https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-emailer gerrit]) ([https://gerrit.wikimedia.org/g/cloud/toolforge/jobs-framework-emailer gitiles]) --- a daemon that uses [https://github.com/kubernetes-client/python the official k8s python client] and [https://docs.python.org/3/library/asyncio.html asyncio]. It runs inside k8s, listen to pod events, and email users about their jobs activity.


By splitting the software into several components, and introducing an stable API, we aim to reduce maintenance burden by not needing to rebuild all Toolforge docker containers every time we change some internal mechanism (which is the case of the tools-webservice package).
The REST API is freely usable within Toolforge, both bastion servers and kubernetes pods. This means that a running job can interact with the Toolforge jobs API and CRUD other jobs.


Also, the new REST API can be used by any user inside Toolforge. We open the door to enable a simplified programmatic usage of this new kubernetes jobs feature, which can be a nice incentive for users to migrate away from the grid.
=== Auth ===


=== k8s abstraction that matches GridEngine experience ===
We would like to support a similar experience to what users are used to in GridEngine. Given the '''feature mapping table''' below, it should be possible in Kubernetes by using the following mechanisms:
* <code>Job</code>. This object is the basic definition of a workload in the k8s cluster that makes it run a given task and ensure it finished as expected.
* <code>CronJob</code>. This object support cron-like scheduling of child <code>Jobs</code> objects.
* <code>ReplicationController</code>. This object is used to ensure a given <code>Job</code> is present. Used to control execution of continuous tasks, a feature not supported natively in the <code>Job</code> object.
=== Auth ===
{{tracked|T274139}}
[[File:Toolforge_jobs-auth.png|center|500px]]
[[File:Toolforge_jobs-auth.png|center|500px]]


To ensure that Toolforge users only manage their own jobs, TJF will use kubernetes certificates for client authentication. These x509 certificates are automatically managed by <code>maintain-kubeusers</code>, and live in each user home directory:
To ensure that Toolforge users only manage their own jobs, TJF uses kubernetes certificates for client authentication. These x509 certificates are automatically managed by <code>maintain-kubeusers</code>, and live in each user home directory:


<syntaxhighlight lang="shell-session">
<syntaxhighlight lang="shell-session">
Line 51: Line 34:
</syntaxhighlight>
</syntaxhighlight>


In the current Toolforge webservice setup, TLS termination is done at the nginx front proxy. The front proxy talks to the backends using plain HTTP, with no simple options for relaying or forwarding the original client TLS certs.  We would need to introduce modifications to the front proxy to accept client TLS certificates, so instead we decided to run a parallel ingress.
The <code>jobs-framework-api</code> component needs to know the client certificate '''CommonName'''. With this information, <code>jobs-framework-api</code> can ''supplant'' the user by reading again the x509 certificates from the user home, and use them to interact with the kubernetes API. This is effectively a TLS proxy that reuses the original certificate.


The <code>toolforge-jobs-api</code> component needs to know the client certificate '''CommonName'''. With this information, <code>toolforge-jobs-api</code> will be able to ''supplant'' the user by reading again the x509 certificates from the user home, and use them to interact with the kubernetes API. This is effectively a TLS proxy that reuses the original certificate.
In the current Toolforge webservice setup, TLS termination is done at the nginx front proxy. The front proxy talks to the backends using plain HTTP, with no simple options for relaying or forwarding the original client TLS certs.  That's why the <code>jobs-framework-api</code> doesn't use the main Toolofrge ingress setup.


This results in two types of connections, as shown in the diagram above:
This results in two types of connections, as shown in the diagram above:


* '''connection type 1''': an user contacts <code>toolforge-jobs-api</code> using k8s client TLS certs from its home directory. The TLS connection is established to the <code>ingress-ngnx-jobs</code>, which has the client-side TLS termination. This can happen from a Toolforge bastion, or from a Job already running inside kubernetes. The connection can be made either using <code>toolforge-jobs-cli</code> or directly contacting <code>toolforge-jobs-api</code> programmatically by other methods.
* '''connection type 1''': an user contacts <code>jobs-framework-api</code> using k8s client TLS certs from its home directory. The TLS connection is established to the <code>ingress-ngnx-jobs</code>, which has the client-side TLS termination. This can happen from a Toolforge bastion, or from a Job already running inside kubernetes. The connection can be made either using <code>jobs-framework-cli</code> or directly contacting <code>jobs-framework-api</code> programmatically by other methods.
* '''connection type 2''': once the CommonName of the original request certificate is validated, <code>toolforge-jobs-api</code> can load the same k8s client TLS certificate from the user home, and ''supplant'' the user to contact the k8s API. For this to be possible, the <code>toolforge-jobs-api</code> component needs permissions for every user home directory, pretty much like <code>maintain-kubeusers</code> has.
* '''connection type 2''': once the CommonName of the original request certificate is validated, <code>jobs-framework-api</code> can load the same k8s client TLS certificate from the user home, and ''supplant'' the user to contact the k8s API. For this to be possible, the <code>jobs-framework-api</code> component needs permissions for every user home directory, pretty much like <code>maintain-kubeusers</code> has.


This setup is possible because the x509 certificates are maintained by the <code>maintain-kubeusers</code> component, and because <code>toolforge-jobs-api</code> runs inside the kubernetes cluster itself and therefore can be configured with enough permissions to read each users home.
This setup is possible because the x509 certificates are maintained by the <code>maintain-kubeusers</code> component, and because <code>jobs-framework-api</code> runs inside the kubernetes cluster itself and therefore can be configured with enough permissions to read each users home.


More or other authentication mechanisms can be introduced in the future as we detect new use cases.
More or other authentication mechanisms can be introduced in the future as we detect new use cases.
Line 66: Line 49:
The Toolforge front proxy exists today basically for webservices running in the grid. Once the grid is fully deprecated and we no longer need the front proxy, we could re-evaluate this whole situation and simplify it.
The Toolforge front proxy exists today basically for webservices running in the grid. Once the grid is fully deprecated and we no longer need the front proxy, we could re-evaluate this whole situation and simplify it.


=== Not using the framework ===
=== Ingress & TLS ===


Advanced Toolforge users that know how to interact with a Kubernetes API can still use it directly (like for webservices). Using the new TJF is optional and is provided just as a convenient facility for Toolforge users.
The <code>jobs-framework-api</code> doesn't use a kubernetes ingress deployment. Instead, it deploys its own nodeport service in the <code>jobs-api</code> namespace.


=== The containers problem ===
The jobs-specific one is able to read TLS client certificates and pass the <code>ssl-client-subject-dn</code> HTTP header to the pod running the <code>toolforge-jobs-api</code> webservice.
With this information <code>toolforge-jobs-api</code> can load again the client cert when talking to the k8s API on behalf of the original user.


We have custom-built containers for Toolforge webservices. Containers for the most common web development frameworks and language runtimes. Each container don't include every and each language and framework in the universe for practical reasons.
The way this whole ingress/TLS setup works is as follows:
* The FQDN <code>jobs.svc.toolsbeta.eqiad1.wikimedia.cloud</code> that points to the k8s haproxy VIP address.
* The haproxy system listens on 30001/TCP for this jobs-specific ingress (and in 30000/TCP for the general one).
* The haproxy daemon reaches all k8s worker nodes on 30001/TCP, where there is a nodeport service in the <code>jobs-api</code> namespace, that redirects packets to the <code>jobs-api</code> deployment.
* The deployment consist on 1 pod with 2 containers: nginx & the <code>jobs-framework-api</code> itself.
* The nginx container handles the TLS termination and proxies the API by means of a socket.
* Once the TLS certs are verified the proxy injects the HTTP header <code>ssl-client-subject-dn</code> to <code>jobs-framework-api</code>, which contains the <code>CN=</code> information of the original user.
* With the <code>ssl-client-subject-dn</code> header, <code>jobs-framework-api</code> can load again the client certificate from the original user home on NFS and in turn contact the k8s API using them.


However, users can currently schedule jobs in GridEngine using any language, library or framework installed in our Debian bastions. They can write a script that combines calls to Python, PHP and Perl.
=== About logs ===
We would need to think and develop a container solution that enables job users with the appropriate runtimes.


For the few first iterations of this project it can suffice to make <code>pywikibot</code> available. We can work later on to discover more useful runtimes.
Logs produced by jobs should not be made available using <code>kubectl logs</code> because that means the stderr/stdout of the pod is being RW in the etcd cluster. If left unattended, logs produced by jobs can easily hammer and bring down our etcd clusters.


== Implementation details ==
Logs should be stored in each user NFS home directory, until we come up with some holistic solution at kubernetes level like https://kubernetes.io/docs/concepts/cluster-administration/logging/


TODO: Arturo would like to use Python3 to build TJF, using flask-restful https://flask-restful.readthedocs.io
=== Endpoints ===


Checking client TLS certs: https://www.ajg.id.au/2018/01/01/mutual-tls-with-python-flask-and-werkzeug/
Some relevant URLs:
* https://jobs.svc.tools.eqiad1.wikimedia.cloud:30001/api/v1 --- API endpoint in the '''tools''' project.
* https://jobs.svc.toolsbeta.eqiad1.wikimedia.cloud:30001/api/v1 --- API endpoint in the '''toolsbeta''' project.
* https://jobs.toolforge.org/ --- name-reserved Toolforge tool ([https://toolsadmin.wikimedia.org/tools/id/jobs toolsadmin]) ([https://toolhub.wikimedia.org/tools/toolforge-jobs toolhub])


dcaro: Might be interesting to use https://pypi.org/project/flask-swagger/ also (specially if the api is open)
Please note that as of this writing the API endpoints are only available within Toolforge / Cloud VPS (internal IP address, no floating IP).


hieu: check https://fastapi.tiangolo.com/
== Deployment and maintenance ==


=== timeline ===
Information on how to deploy and maintain the framework.


Proposed timeline for implementation, development and feature rollout.
=== jobs-framework-api ===


* FY20/21 Q3: Design & proposal. Basic TJF source code bootstrap.
==== deployment ====
* FY20/21 Q4: A minimal TJF is developed. Select a few beta testers and early adopters.
The usual workflow to deploy a custom k8s component, which should really be automated, see [[phab:T291915 | Phabricator T291915: toolforge: automate how we deploy custom k8s components]].
* FY21/22 Q1: Announce to the community new framework availability, work with users to migrate to it.
* FY21/22 Q2: Closer to grid deprecation? <3


=== about logs ===
==== maintenance ====


If left unattended, logs produced by jobs can easily hammer and bring down our etcd clusters We should come with a solution to strictly restrict logging, and redirect them to each user NFS home directory.
To see logs, try something like:
 
<syntaxhighlight lang="shell-session">
user@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl logs deployment/jobs-api -n jobs-api nginx
[..]
192.168.17.192 - - [15/Feb/2022:12:57:54 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.81.64 - - [15/Feb/2022:12:59:50 +0000] "GET /api/v1/list/ HTTP/1.1" 200 3 "-" "python-requests/2.21.0"
192.168.17.192 - - [15/Feb/2022:13:00:34 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.81.64 - - [15/Feb/2022:13:01:01 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.17.192 - - [15/Feb/2022:13:01:02 +0000] "POST /api/v1/run/ HTTP/1.1" 409 52 "-" "python-requests/2.21.0"
user@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl logs deployment/jobs-api -n jobs-api webservice
[..]
*** Operational MODE: single process ***
mounting api:app on /
Adding available container: {'shortname': 'tf-bullseye-std', 'image': 'docker-registry.tools.wmflabs.org/toolforge-bullseye-standalone:latest'}
Adding available container: {'shortname': 'tf-buster-std-DEPRECATED', 'image': 'docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest'}
Adding available container: {'shortname': 'tf-golang', 'image': 'docker-registry.tools.wmflabs.org/toolforge-golang-sssd-base:latest'}
Adding available container: {'shortname': 'tf-golang111', 'image': 'docker-registry.tools.wmflabs.org/toolforge-golang111-sssd-base:latest'}
Adding available container: {'shortname': 'tf-jdk17', 'image': 'docker-registry.tools.wmflabs.org/toolforge-jdk17-sssd-base:latest'}
[..]
</syntaxhighlight>


Some potential ideas on how to do that at kubernetes level: https://kubernetes.io/docs/concepts/cluster-administration/logging/
To verify the API endpoint is up try something like:


To be clear, this means that logs produced by jobs should not be made available using <code>kubectl logs</code> because that means the stderr/stdout of the pod is being RW in the etcd cluster.
<syntaxhighlight lang="shell-session">
user@toolsbeta-test-k8s-control-4:~$ curl https://jobs.svc.toolsbeta.eqiad1.wikimedia.cloud:30001/api/v1/list -k
<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
<hr><center>nginx/1.21.0</center>
</body>
</html>
</syntaxhighlight>


=== URL ===
The 400 error is expected in that example because we're not sending a TLS client certificate, meaning nginx is doing its work correctly.


Relevant URLs in the Toolforge project:
=== jobs-framework-cli ===
* https://jobs.toolforge.org/ --- main page, with brief explanation and links to documentation
* https://jobs.toolforge.org/api/ --- API endpoints


Relevant URLs in the Toolsbeta project:
==== deployment ====
* TBD
A simple debian package installed on the bastions. See [[Portal:Toolforge/Admin/Packaging]].


=== Feature mapping ===
=== jobs-framework-emailer ===


Each currently supported use case in the grid should have an equivalent feature in kubernetes. The table below should help map each one.
==== deployment ====
The usual workflow to deploy a custom k8s component, which should really be automated, see [[phab:T291915 | Phabricator T291915: toolforge: automate how we deploy custom k8s components]].


Additionally, the table shows how each feature would map to the TJF.
==== maintenance ====


{| class="wikitable sortable"
TODO: in development, see [[phab:T286135 | Phabricator T286135: Toolforge jobs framework: email maintainers on job failure]].
|+ Toolforge jobs feature mapping table
|-
! Feature !! GridEngine !! Kubernetes !! toolforge-jobs-cli !! toolforge-jobs-api
|-
| simple one-off job launch || <code>jsub</code> || native Job API support || <code>toolforge-jobs run <cmd> --type <container></code> || POST /api/v1/run/
|-
| get single job status || <code>qstat</code> || <code>kubectl describe job</code> || <code>toolforge-jobs show <id></code> || GET /api/v1/show/{id}/
|-
| get all jobs status || <code>qstat</code> || <code>kubectl</code> + some scripting || <code>toolforge-jobs list</code> || GET /api/v1/list/
|-
| delete job || <code>jstop</code> || <code>kubectl delete</code> || <code>toolforge-jobs delete <id></code> || DELETE /api/v1/delete/{id}/
|-
| delete all jobs || some scripting || <code>kubectl delete</code> || <code>toolforge-jobs flush</code> || DELETE /api/v1/flush/
|-
| scheduled jobs || <code>crontab</code> + <code>jsub</code> || native CronJob API support || <code>toolforge-jobs run <cmd> --type <container> --schedule <sched></code> || POST /api/v1/run/
|-
| continuous job launch (bot, daemon) || <code>jstart</code> || native ReplicationController API support || <code>toolforge-jobs run <cmd> --type <container> --continuous</code> || POST /api/v1/run/
|-
| concurrency limits || 16 running + 34 scheduled || TBD. several potential mechanisms || TBD || TBD
|-
| get stderr / stdout of a job || files in the NFS directory || files in the NFS directory || No initial support || No initial API support
|-
| request additional mem || <code>jsub -mem</code> || TBD. we may not need this || TBD || TBD
|-
| sync run || <code>jsub -sync y</code> || TBD. no native support || <code>toolforge-jobs run <cmd> --type <container> --wait || POST /api/v1/run/ + GET /api/v1/show/{id}/
|-
| making sure a job only runs once || <code>jsub -once</code> || native Job API support || <code>toolforge-jobs run <cmd> --type <container></code> || POST /api/v1/run/
|-
| listing available containers || No support / not required || Similar to what we do on tools-webservices || <code>toolforge-jobs containers</code> || GET /api/v1/containers/
|}


=== API docs ===
== API docs ==


This section contains concrete details for the API that TJF introduces.
This section contains concrete details for the API that TJF introduces.
'''TODO:''' this is outdated, we need swagger or similar to keep this up-to-date.


==== POST /api/v1/run/ ====
==== POST /api/v1/run/ ====
Line 335: Line 327:
{{Collapse bottom}}
{{Collapse bottom}}


=== Ingress & TLS ===
[[File:Toolforge_jobs-ingress.png|500px|center]]
There are 2 nginx-ingress deployments in parallel in the k8s cluster:
* the general one for all toolforge tools, in the <code>ingress-nginx</code> namespace, untouched by this project
* the jobs-specific one, in the <code>ingress-nginx-jobs</code> namespace.
The jobs-specific one is able to read TLS client certificates and pass the <code>ssl-client-subject-dn</code> HTTP header to the pod running the <code>toolforge-jobs-api</code> webservice.
With this information <code>toolforge-jobs-api</code> can load again the client cert when talking to the k8s API on behalf of the original user.
The way this whole ingress /TLS setup works is as follows:
* To reach the <code>ingress-nginx-jobs</code> ingress, there is a FQDN <code>jobs.svc.toolsbeta.eqiad1.wikimedia.cloud</code> that points to the k8s haproxy VIP address.
* The haproxy system listens on 30001/TCP for this jobs-specific ingress (and in 30000/TCP for the general one).
* The haproxy daemon reaches k8s ingress nodes on 30001/TCP (the <code>ingress-nginx-jobs-svc</code> so traffic both internal and external to the cluster can reach the nginx proxy.
* The <code>ingress-nginx-jobs</code> is configured to only load 1 <code>Ingress</code> object, which is the one defined for the <code>toolforge-jobs-api</code>.
* The <code>Ingress</code> object instructs <code>ingress-nginx-jobs</code> to enable client TLS by using the annotation <code>nginx.ingress.kubernetes.io/auth-tls-verify-client: on</code> and <code>nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"</code>.
* Client TLS certs are verified against the kubernetes CA, which should be configured in the <code>nginx.ingress.kubernetes.io/auth-tls-secret: "default/ca-secret"</code> annotation of the <code>Ingress</code> object.
* Once the TLS certs are verified the proxy injects the HTTP header <code>ssl-client-subject-dn</code> to <code>toolforge-jobs-api</code>, which contains the <code>CN=</code> information of the original user.
* With the <code>ssl-client-subject-dn</code> header, <code>toolforge-jobs-api</code> can load again the client certificate from the original user home on NFS and in turn contact the k8s API using them.
In order for the 2 nginx-ingress deployments on the cluster to ignore each other <code>Ingress</code> objects, we need a few additional bits.
==== in kubernetes 1.17 ====
Per [https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/ upstream docs] we need to pay attention to the <code>--ingress-class</code> nginx command line flag.
* the general ingress, using the default value (unset), which means it will process every ingress with the <code>kubernetes.io/ingress.class: "nginx"</code> annotation or with no annotation at all.
* the jobs ingress, using <code>--ingress-class=jobs</code>, which will handle every ingress with the <code>kubernetes.io/ingress.class: "jobs"</code> annotation (there should be just one).
==== in kubernetes 1.18 ====
Per [https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#extending-ingress-with-and-replacing-a-deprecated-annotation-with-ingressclass the release notes], the <code>kubernetes.io/ingress.class</code> annotation is deprecated, and the <code>IngressClass</code> resource should be used instead. The <code>Ingress</code> object should use the new <code>ingressClassName</code> field.
== Development ==
Some random stuff that Arturo has written here.
=== notes on k8s objects ===
{{Collapse top|Example workflow using the k8s API with a Job object}}
<syntaxhighlight lang="shell-session">
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl delete job arturo-test-job ; kubectl apply -f job.yaml
job.batch "arturo-test-job" deleted
job.batch/arturo-test-job created
tools.arturo-test-tool@tools-sgebastion-08:~$ cat job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: arturo-test-job
spec:
  backoffLimit: 3
  template:
    spec:
      containers:
      - name: arturo-test
        image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
        workingDir: /data/project/arturo-test-tool/
        command:
          - ./arturo-test-script.sh
        env:
        - name: HOME
          value: /data/project/arturo-test-tool
        volumeMounts:
        - mountPath: /data/project
          name: home
      restartPolicy: Never
      volumes:
      - hostPath:
          path: /data/project
          type: Directory
        name: home
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl get jobs
NAME              COMPLETIONS  DURATION  AGE
arturo-test-job  1/1          9s        88s
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl get pods
NAME                    READY  STATUS      RESTARTS  AGE
arturo-test-job-9lzkc  0/1    Completed  0          94s
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl logs job/arturo-test-job
arturo test script
Done sleeping
</syntaxhighlight>
{{Collapse bottom}}
{{Collapse top|Example workflow using the k8s API with a CronJob object}}
<syntaxhighlight lang="shell-session">
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: test-cronjob
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: test-job
            image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
            workingDir: /data/project/test/
            command:
              - ./test-script.sh
            env:
            - name: HOME
              value: /data/project/test   
            volumeMounts:
            - mountPath: /data/project
              name: home
          restartPolicy: Never
          volumes:
          - hostPath:
              path: /data/project
              type: Directory
            name: home
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl apply -f cronjob.yaml
cronjob.batch/test-cronjob created
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get cronjob
NAME          SCHEDULE      SUSPEND  ACTIVE  LAST SCHEDULE  AGE
test-cronjob  */1 * * * *  False    0        <none>          5s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl describe cronjob test-cronjob
Name:                          test-cronjob
Namespace:                    tool-test
[..]
Schedule:                      */1 * * * *
[..]
Events:
  Type    Reason            Age    From                Message
  ----    ------            ----  ----                -------
  Normal  SuccessfulCreate  4m29s  cronjob-controller  Created job test-cronjob-1611663900
  Normal  SawCompletedJob  4m19s  cronjob-controller  Saw completed job: test-cronjob-1611663900, status: Complete
  Normal  SuccessfulCreate  3m29s  cronjob-controller  Created job test-cronjob-1611663960
  Normal  SawCompletedJob  3m19s  cronjob-controller  Saw completed job: test-cronjob-1611663960, status: Complete
  Normal  SuccessfulCreate  2m29s  cronjob-controller  Created job test-cronjob-1611664020
  Normal  SawCompletedJob  2m19s  cronjob-controller  Saw completed job: test-cronjob-1611664020, status: Complete
  Normal  SuccessfulCreate  89s    cronjob-controller  Created job test-cronjob-1611664080
  Normal  SawCompletedJob  78s    cronjob-controller  Saw completed job: test-cronjob-1611664080, status: Complete
  Normal  SuccessfulDelete  78s    cronjob-controller  Deleted job test-cronjob-1611663900
  Normal  SuccessfulCreate  38s    cronjob-controller  Created job test-cronjob-1611664140
  Normal  SawCompletedJob  28s    cronjob-controller  Saw completed job: test-cronjob-1611664140, status: Complete
  Normal  SuccessfulDelete  28s    cronjob-controller  Deleted job test-cronjob-1611663960
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get jobs
NAME                      COMPLETIONS  DURATION  AGE
test-cronjob-1611664020  1/1          2s        2m33s
test-cronjob-1611664080  1/1          3s        93s
test-cronjob-1611664140  1/1          2s        42s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete cronjob/test-cronjob
cronjob.batch "test-cronjob" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get jobs
No resources found in tool-test namespace.
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get cronjobs
No resources found in tool-test namespace.
</syntaxhighlight>
{{Collapse bottom}}
{{Collapse top|Example workflow using the k8s API with a ReplicationController object}}
<syntaxhighlight lang="shell-session">
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat replicationcontroller.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: test-continous-job
spec:
  replicas: 1
  selector:
    app: test-job
  template:
    metadata:
      name: test-job
      labels:
        app: test-job
    spec:
      containers:
      - name: test-job
        image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
        workingDir: /data/project/test/
        command:
          - ./test-script-whiletrue.sh
        env:
        - name: HOME
          value: /data/project/test
        volumeMounts:
        - mountPath: /data/project
          name: home
      restartPolicy: Always
      volumes:
      - hostPath:
          path: /data/project
          type: Directory
        name: home
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat test-script-whiletrue.sh
#!/bin/bash
while true ; do
  echo "INFO: running test script"
  date
  sleep 5
done
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl apply -f replicationcontroller.yaml
replicationcontroller/test-continous-job created
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get replicationcontroller
NAME                DESIRED  CURRENT  READY  AGE
test-continous-job  1        1        1      12s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME                          READY  STATUS    RESTARTS  AGE
test-continous-job-m5sxh      1/1    Running  0          17s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl logs test-continous-job-m5sxh
INFO: running test script
Tue 26 Jan 2021 12:38:02 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:07 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:12 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:17 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:22 PM UTC
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete pod test-continous-job-m5sxh
pod "test-continous-job-m5sxh" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME                          READY  STATUS    RESTARTS  AGE
test-continous-job-95n98      1/1    Running  0          64s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl describe replicationcontroller/test-continous-job
Name:        test-continous-job
Namespace:    tool-test
[..]
Events:
  Type    Reason            Age  From                    Message
  ----    ------            ----  ----                    -------
  Normal  SuccessfulCreate  111s  replication-controller  Created pod: test-continous-job-m5sxh
  Normal  SuccessfulCreate  77s  replication-controller  Created pod: test-continous-job-95n98
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete replicationcontroller test-continous-job
replicationcontroller "test-continous-job" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME                          READY  STATUS        RESTARTS  AGE
test-continous-job-95n98      1/1    Terminating  0          3m40s
</syntaxhighlight>
{{Collapse bottom}}
=== development environment ===
The development environment is somewhat non trivial to set up. Given that TJF operates in a way similar to <code>maintain-kubeusers</code>, you will need a local kubernetes clusters (using minikube) to be able to emulate the Toolforge environment.
TODO: add more information here.
=== source code ===
Gerrit repositories:
* https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-api
* https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-cli


== See also ==
== See also ==
Internal documents:
* [[Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_jobs]] -- where this was initially designed.
* [[Help:Toolforge/Grid]]
* [[Help:Toolforge/Jobs_framework]] -- end user documentation


Some upstream kubernetes documentation pointers:
Some upstream kubernetes documentation pointers:

Revision as of 12:41, 16 February 2022

This page contains information about the Toolforge Jobs Framework, an architecture to support grid-like jobs on Toolforge kubernetes.

The framework

The framework is called Toolforge Jobs Framework (or TJF). The main component is a REST API to ease end user interaction with Toolforge jobs in the kubernetes cluster. The API abstracts away most of the k8s gory details for configuring, removing, managing and reading status on jobs. The abstraction approach is similar to what is being done with Toolforge webservices (we have the webservice command there), but with an approach that consist on having most of the business logic in an API service.

By splitting the software into several components, and introducing an stable API, we aim to reduce maintenance burden by not needing to rebuild all Toolforge docker containers every time we change some internal mechanism (which is the case of the tools-webservice package).

The framework consists on 3 components:

  • jobs-framework-api (gerrit) (gitiles) --- uses flask-restful and runs inside the k8s cluster as a webservice. Offers the REST API that in turn interacts with the k8s API native objects: CronJob, Job and Deployment.
  • jobs-framework-cli (gerrit) (gitiles) --- command line interface to interact with the jobs API service. Typically used by end users in Toolforge bastions.
  • jobs-framework-emailer (gerrit) (gitiles) --- a daemon that uses the official k8s python client and asyncio. It runs inside k8s, listen to pod events, and email users about their jobs activity.

The REST API is freely usable within Toolforge, both bastion servers and kubernetes pods. This means that a running job can interact with the Toolforge jobs API and CRUD other jobs.

Auth

To ensure that Toolforge users only manage their own jobs, TJF uses kubernetes certificates for client authentication. These x509 certificates are automatically managed by maintain-kubeusers, and live in each user home directory:

toolsbeta.test@toolsbeta-sgebastion-04:~$ egrep client-certificate\|client-key .kube/config
    client-certificate: /data/project/test/.toolskube/client.crt
    client-key: /data/project/test/.toolskube/client.key
toolsbeta.test@toolsbeta-sgebastion-04:~$ head -1 /data/project/test/.toolskube/client.crt
-----BEGIN CERTIFICATE-----
toolsbeta.test@toolsbeta-sgebastion-04:~$ head -1 /data/project/test/.toolskube/client.key
-----BEGIN RSA PRIVATE KEY-----

The jobs-framework-api component needs to know the client certificate CommonName. With this information, jobs-framework-api can supplant the user by reading again the x509 certificates from the user home, and use them to interact with the kubernetes API. This is effectively a TLS proxy that reuses the original certificate.

In the current Toolforge webservice setup, TLS termination is done at the nginx front proxy. The front proxy talks to the backends using plain HTTP, with no simple options for relaying or forwarding the original client TLS certs. That's why the jobs-framework-api doesn't use the main Toolofrge ingress setup.

This results in two types of connections, as shown in the diagram above:

  • connection type 1: an user contacts jobs-framework-api using k8s client TLS certs from its home directory. The TLS connection is established to the ingress-ngnx-jobs, which has the client-side TLS termination. This can happen from a Toolforge bastion, or from a Job already running inside kubernetes. The connection can be made either using jobs-framework-cli or directly contacting jobs-framework-api programmatically by other methods.
  • connection type 2: once the CommonName of the original request certificate is validated, jobs-framework-api can load the same k8s client TLS certificate from the user home, and supplant the user to contact the k8s API. For this to be possible, the jobs-framework-api component needs permissions for every user home directory, pretty much like maintain-kubeusers has.

This setup is possible because the x509 certificates are maintained by the maintain-kubeusers component, and because jobs-framework-api runs inside the kubernetes cluster itself and therefore can be configured with enough permissions to read each users home.

More or other authentication mechanisms can be introduced in the future as we detect new use cases.

The Toolforge front proxy exists today basically for webservices running in the grid. Once the grid is fully deprecated and we no longer need the front proxy, we could re-evaluate this whole situation and simplify it.

Ingress & TLS

The jobs-framework-api doesn't use a kubernetes ingress deployment. Instead, it deploys its own nodeport service in the jobs-api namespace.

The jobs-specific one is able to read TLS client certificates and pass the ssl-client-subject-dn HTTP header to the pod running the toolforge-jobs-api webservice. With this information toolforge-jobs-api can load again the client cert when talking to the k8s API on behalf of the original user.

The way this whole ingress/TLS setup works is as follows:

  • The FQDN jobs.svc.toolsbeta.eqiad1.wikimedia.cloud that points to the k8s haproxy VIP address.
  • The haproxy system listens on 30001/TCP for this jobs-specific ingress (and in 30000/TCP for the general one).
  • The haproxy daemon reaches all k8s worker nodes on 30001/TCP, where there is a nodeport service in the jobs-api namespace, that redirects packets to the jobs-api deployment.
  • The deployment consist on 1 pod with 2 containers: nginx & the jobs-framework-api itself.
  • The nginx container handles the TLS termination and proxies the API by means of a socket.
  • Once the TLS certs are verified the proxy injects the HTTP header ssl-client-subject-dn to jobs-framework-api, which contains the CN= information of the original user.
  • With the ssl-client-subject-dn header, jobs-framework-api can load again the client certificate from the original user home on NFS and in turn contact the k8s API using them.

About logs

Logs produced by jobs should not be made available using kubectl logs because that means the stderr/stdout of the pod is being RW in the etcd cluster. If left unattended, logs produced by jobs can easily hammer and bring down our etcd clusters.

Logs should be stored in each user NFS home directory, until we come up with some holistic solution at kubernetes level like https://kubernetes.io/docs/concepts/cluster-administration/logging/

Endpoints

Some relevant URLs:

Please note that as of this writing the API endpoints are only available within Toolforge / Cloud VPS (internal IP address, no floating IP).

Deployment and maintenance

Information on how to deploy and maintain the framework.

jobs-framework-api

deployment

The usual workflow to deploy a custom k8s component, which should really be automated, see Phabricator T291915: toolforge: automate how we deploy custom k8s components.

maintenance

To see logs, try something like:

user@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl logs deployment/jobs-api -n jobs-api nginx
[..]
192.168.17.192 - - [15/Feb/2022:12:57:54 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.81.64 - - [15/Feb/2022:12:59:50 +0000] "GET /api/v1/list/ HTTP/1.1" 200 3 "-" "python-requests/2.21.0"
192.168.17.192 - - [15/Feb/2022:13:00:34 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.81.64 - - [15/Feb/2022:13:01:01 +0000] "GET /api/v1/containers/ HTTP/1.1" 200 2655 "-" "python-requests/2.21.0"
192.168.17.192 - - [15/Feb/2022:13:01:02 +0000] "POST /api/v1/run/ HTTP/1.1" 409 52 "-" "python-requests/2.21.0"
user@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl logs deployment/jobs-api -n jobs-api webservice
[..]
*** Operational MODE: single process ***
mounting api:app on /
Adding available container: {'shortname': 'tf-bullseye-std', 'image': 'docker-registry.tools.wmflabs.org/toolforge-bullseye-standalone:latest'}
Adding available container: {'shortname': 'tf-buster-std-DEPRECATED', 'image': 'docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest'}
Adding available container: {'shortname': 'tf-golang', 'image': 'docker-registry.tools.wmflabs.org/toolforge-golang-sssd-base:latest'}
Adding available container: {'shortname': 'tf-golang111', 'image': 'docker-registry.tools.wmflabs.org/toolforge-golang111-sssd-base:latest'}
Adding available container: {'shortname': 'tf-jdk17', 'image': 'docker-registry.tools.wmflabs.org/toolforge-jdk17-sssd-base:latest'}
[..]

To verify the API endpoint is up try something like:

user@toolsbeta-test-k8s-control-4:~$ curl https://jobs.svc.toolsbeta.eqiad1.wikimedia.cloud:30001/api/v1/list -k
<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
<hr><center>nginx/1.21.0</center>
</body>
</html>

The 400 error is expected in that example because we're not sending a TLS client certificate, meaning nginx is doing its work correctly.

jobs-framework-cli

deployment

A simple debian package installed on the bastions. See Portal:Toolforge/Admin/Packaging.

jobs-framework-emailer

deployment

The usual workflow to deploy a custom k8s component, which should really be automated, see Phabricator T291915: toolforge: automate how we deploy custom k8s components.

maintenance

TODO: in development, see Phabricator T286135: Toolforge jobs framework: email maintainers on job failure.

API docs

This section contains concrete details for the API that TJF introduces.

TODO: this is outdated, we need swagger or similar to keep this up-to-date.

POST /api/v1/run/

Creates a new job in the kubernetes cluster.

GET /api/v1/show/{name}/

Shows information about a job in the kubernetes cluster.

DELETE /api/v1/delete/{name}

Delete a job in the kubernetes cluster.

GET /api/v1/list/

Shows information about all user jobs in the kubernetes cluster.

DELETE /api/v1/flush/

Delete all user jobs in the kubernetes cluster.

GET /api/v1/containers/

Shows information about all containers available for jobs in the kubernetes cluster.


See also

Some upstream kubernetes documentation pointers:

Related components: