You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Help:Toolforge/Jobs framework
![]() | The Toolforge jobs framework is currently in BETA phase. Some bugs may be present, some features may be missing and some interfaces may change. |
![]() | The toolforge-jobs command line interface can only be used from Debian Buster bastions: login-buster.toolforge.org and dev-buster.toolforge.org |
This page contains information on the Toolforge jobs framework.
Every non-trivial task performed in Toolforge (like executing a script or running a bot) should be dispatched to a job scheduling backend (in this case, Kubernetes), which ensures that the job is run in a suitable place with sufficient resources.
The basic principle of running jobs is fairly straightforward:
- You create a job from a submission server (usually login.toolforge.org)
- Kubernetes finds a suitable execution node to run the job on, and starts it there once resources are available
- As it runs, your job will send output and errors to files until the job completes or is aborted.
Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once.
Creating jobs
Information about job creation using the toolforge-jobs run
command.
Creating one-off jobs
One-off jobs (or normal jobs) are workloads that will be scheduled by Toolforge Kubernetes and run until finished. They will run once, and are expected to finish at some point.
Select a runtime, a command in your tool home directory and then use toolforge-jobs run
to create the job, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std
The --command
option supports input arguments, using quotes, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command "./mycommand.sh --witharguments" --image tf-bullseye-std
You can instruct the command line to wait and don't return until the job is finished with the --wait
option, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std --wait
Creating scheduled jobs (cron jobs)
To schedule a recurrent job (also known as cron jobs), use the --schedule WHEN
option when creating it:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run mycronjob --command ./daily.sh --image tf-bullseye-std --schedule "17 13 * * *"
The schedule argument uses cron syntax (see also cron on Wikipedia).
If you need to run a daily/hourly job, please avoid scheduling jobs at exactly midnight (00:00) or at the top of the hour (at :00 minutes) if your job does not explicitly require it. Instead, pick a random time of the day so that system load is balanced evenly through the day.
Creating continuous jobs
Continuous jobs are programs that are never meant to end. If they end (for example, because of an error) the Toolforge Kubernetes system will restart them.
To create a continuous job, use the --continuous
option:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myalwaysrunningjob --command ./myendlesscommand.sh --image tf-bullseye-std --continuous
About the executable
In all job types (normal, continuous, cronjob) the --command
parameter should meet the following conditions:
- it should refer to an executable file.
- mind the path, the command working directory is the tools home directory, so
--command mycommand.sh
will likely fail (it references $PATH), and--command ./mycommand.sh
is likely what you mean. - arguments are optional but if present then better use quotes, example:
--command "./mycommand.sh --arg1 x --arg2 y"
.
Failing to meet any of these conditions will lead to errors either before launching the job, or shortly after the job is processed by the backend.
About the job name
The job name is a unique string identifier. The string should meet these criteria:
- between 1 and 100 characters long.
- any combination of number, lower-case letters and the
-
(dash) character. - no spaces, no special symbols.
Failing to meet any of these conditions will lead to errors either before launching the job, or shortly after the job is processed by the backend.
Choosing the execution runtime
In Toolforge Kubernetes we offer a pre-defined set of container images that you can use as the execution runtime for your job.
To view which execution runtimes are available, run the toolforge-jobs images
command.
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs images
Short name Container image URL
------------------------ ----------------------------------------------------------------------
tf-bullseye-std docker-registry.tools.wmflabs.org/toolforge-bullseye-standalone:latest
tf-buster-std-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest
tf-golang docker-registry.tools.wmflabs.org/toolforge-golang-sssd-base:latest
tf-golang111 docker-registry.tools.wmflabs.org/toolforge-golang111-sssd-base:latest
tf-jdk11-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-jdk11-sssd-base:latest
tf-jdk17 docker-registry.tools.wmflabs.org/toolforge-jdk17-sssd-base:latest
tf-jdk8-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-jdk8-sssd-base:latest
tf-node6-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-node6-sssd-base:latest
tf-node10-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-node10-sssd-base:latest
tf-node12 docker-registry.tools.wmflabs.org/toolforge-node12-sssd-base:latest
tf-php5-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-php5-sssd-base:latest
tf-php72-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-php72-sssd-base:latest
tf-php73-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-php73-sssd-base:latest
tf-php74 docker-registry.tools.wmflabs.org/toolforge-php74-sssd-base:latest
tf-python2-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-python2-sssd-base:latest
tf-python34-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-python34-sssd-base:latest
tf-python35-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-python35-sssd-base:latest
tf-python37-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
tf-python39 docker-registry.tools.wmflabs.org/toolforge-python39-sssd-base:latest
tf-ruby21-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-ruby21-sssd-base:latest
tf-ruby25-DEPRECATED docker-registry.tools.wmflabs.org/toolforge-ruby25-sssd-base:latest
tf-ruby27 docker-registry.tools.wmflabs.org/toolforge-ruby27-sssd-base:latest
tf-tcl86 docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest
We suggest you move away from images marked with the DEPRECATED keyword, since they are old runtimes.
Introducing additional flexibility for execution runtimes is currently part of the WMCS team roadmap.
NOTE: if your tool uses python, you may want to use a virtualenv, see Help:Toolforge/Python#Kubernetes_python_jobs.
Loading jobs from a YAML file
You can define a list of jobs in a YAML file and load them all at once using the toolforge-jobs load
command, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs load jobs.yaml
NOTE: loading jobs from a file flushes all previously defined jobs.
Example YAML file:
# https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework
---
# a cronjob
- name: everyminute
command: ./myothercommand.py -v
image: tf-bullseye-std
no-filelog: true
schedule: "* * * * *"
emails: onfailure
# a continuous job
- image: tf-bullseye-std
name: endlessjob
command: ./dumps-daemon.py --endless
continuous: true
emails: all
# wait for this normal job before loading the next
- name: myjob
image: tf-bullseye-std
command: ./mycommand.sh --argument1
wait: true
emails: onfinish
# another normal job after the previous one finished running
- name: anotherjob
image: tf-bullseye-std
command: ./mycommand.sh --argument1
emails: none
Listing your existing jobs
You can get information about the jobs created for your tool using toolforge-jobs list
, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs list
Job name: Job type: Status:
-------------- -------------------- ---------------------------
myscheduledjob schedule: * * * * * Last schedule time: 2021-06-30T10:26:00Z
alwaysrunning continuous Running
myjob normal Completed
Listing even more information at once is possible using --long
or -l
:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs list -l
Job name: Command: Job type: Image: File log: Emails: Resources: Status:
-------------- ----------------------- ------------------- --------------- --------- ------- ---------- ---------------------------
myscheduledjob ./read-dumps.sh schedule: * * * * * tf-bullseye-std yes none default Last schedule time: 2021-06-30T10:26:00Z
alwaysrunning ./myendlesscommand.sh continuous tf-bullseye-std no all default Running
myjob ./mycommand.sh --debug normal tf-bullseye-std yes onfinish default Completed
NOTE: normal jobs will be deleted from this listing shortly after being completed (even if they finish with some error).
Deleting your jobs
You can delete your jobs in two ways:
- manually delete each job, identified by name, using the
toolforge-jobs delete
command. - delete all defined jobs at once, using the
toolforge-jobs flush
command.
Showing information about your job
You can get information about a defined job using the toolforge-jobs show
command, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs show myscheduledjob
+------------+-----------------------------------------------------------------+
| Job name: | myscheduledjob |
+------------+-----------------------------------------------------------------+
| Command: | ./read-dumps.sh myargument |
+------------+-----------------------------------------------------------------+
| Job type: | schedule: * * * * * |
+------------+-----------------------------------------------------------------+
| Image: | tf-bullseye-std |
+------------+-----------------------------------------------------------------+
| File log: | yes |
+------------+-----------------------------------------------------------------+
| Emails: | none |
+------------+-----------------------------------------------------------------+
| Resources: | mem: 10Mi, cpu: 100 |
+------------+-----------------------------------------------------------------+
| Status: | Last schedule time: 2021-06-30T10:26:00Z |
+------------+-----------------------------------------------------------------+
| Hints: | Last run at 2021-06-30T10:26:08Z. Pod in 'Pending' phase. State |
| | 'waiting' for reason 'ContainerCreating'. |
+------------+-----------------------------------------------------------------+
This should include information about the job status and some hints (in case of failure, etc).
Job logs
Jobs log stdout/stderr to files in your tool home directory.
For a job myjob
, you will find:
- a
myjob.out
file, containing stdout generated by your job. - a
myjob.err
file, containing stderr generated by your job.
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std
tools.mytool@tools-sgebastion-11:~$ ls myjob*
myjob.out myjob.err
Subsequent same-name job runs will append to the same files.
NOTE: as of this writing there is no automatic way to prune log files, so tool users must take care of such files growing too large.
Log generation can disabled with the --no-filelog
parameter when creating a new job, for example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std --no-filelog
Job quotas
Each tool account has a limited quota available. The same quota is used for jobs and other things potentially running on Kubernetes, like webservices.
To check your quota, run:
tools.mytool@tools-sgebastion-11:~$ kubectl describe resourcequotas
Name: tool-mytool
Namespace: tool-mytool
Resource Used Hard
-------- ---- ----
configmaps 2 10
count/cronjobs.batch 0 50 <--
count/deployments.apps 0 3 <--
count/jobs.batch 0 15 <--
limits.cpu 0 2
limits.memory 0 8Gi
persistentvolumeclaims 0 3
pods 0 10
replicationcontrollers 0 1
requests.cpu 0 2
requests.memory 0 6Gi
secrets 1 10
services 0 1
services.nodeports 0 0
The quota entries marked with the <--
symbol indicate:
- maximum number of cronjobs
- maximum number of continuous jobs
- maximum number of jobs
As of this writing, new jobs get 512Mi memory and 1/2 CPU by default.
You can run jobs with additional CPU and memory using the --mem MEM
and --cpu CPU
parameters, example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command "./heavycommand.sh" --image tf-bullseye-std --mem 1Gi --cpu 2
Requesting more memory or CPU will fail if the tool quota is exceeded.
Quota increases
It is possible to request a quota increase if you can demonstrate your tool's need for more resources than the default namespace quota allows. Instructions and a template link for creating a quota request can be found at Toolforge (Quota requests) in Phabricator.
Please read all the instructions there before submitting your request.
Job email notifications
![]() | This is a feature under development and may not properly work just yet. |
You can select to receive email notifications from your job activity, by using the --emails EMAILS
option when creating a job.
The available choices are:
none
, don't get any email notification. The default behavior.onfailure
, receive email notifications in case of a failure event.onfinish
, receive email notifications in case of the job finishing (both successfully and on failure).all
, receive all possible notifications.
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std --emails onfinish
The email will be sent to tools.mytool@toolforge.org
, which is an email alias that by default redirects to all tool maintainers associated with that particular tool account.
Complete example session
Here is a complete example of a work session with the Toolforge jobs framework.
Example shell session |
---|
The following content has been placed in a collapsed box for improved usability. |
$ ssh dev-buster.toolforge.org
$ become $mytool
$ toolforge-jobs containers
Short name Docker container image
------------- ----------------------------------------------------------------------
tf-buster-std docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest
tf-golang111 docker-registry.tools.wmflabs.org/toolforge-golang111-sssd-base:latest
tf-jdk11 docker-registry.tools.wmflabs.org/toolforge-jdk11-sssd-base:latest
tf-node10 docker-registry.tools.wmflabs.org/toolforge-node10-sssd-base:latest
tf-php73 docker-registry.tools.wmflabs.org/toolforge-php73-sssd-base:latest
tf-python37 docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
tf-ruby25 docker-registry.tools.wmflabs.org/toolforge-ruby25-sssd-base:latest
tf-tcl86 docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest
wm-buster docker-registry.tools.wmflabs.org/wikimedia-buster:latest
wm-stretch docker-registry.tools.wmflabs.org/wikimedia-stretch:latest
[..]
$ # running a normal job:
$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-buster-std
$ # running a normal job and waiting for it to complete:
$ toolforge-jobs run myotherjob --command ./myothercommand.sh --image tf-buster-std --wait
$ # running a continuous job:
$ toolforge-jobs run myalwaysrunningjob --command ./myendlesscommand.sh --image tf-buster-std --continuous
$ # running a scheduled job:
$ toolforge-jobs run myscheduledjob --command ./everyminute.sh --image tf-buster-std --schedule "1 * * * *"
$ toolforge-jobs list
Job name: Command: Job type: Container: Status:
-------------- ----------------------- ------------------- ------------- ---------------------------
myscheduledjob ./everyminute.sh schedule: 1 * * * * tf-buster-std Last schedule time: 2021-06-30T10:26:00Z
alwaysrunning ./myendlesscommand.sh continuous tf-buster-std Running
myjob ./mycommand.sh normal tf-buster-std Completed
$ toolforge-jobs show myscheduledjob
+------------+-----------------------------------------------------------------+
| Job name: | myscheduledjob |
+------------+-----------------------------------------------------------------+
| Command: | ./read-dumps.sh |
+------------+-----------------------------------------------------------------+
| Job type: | schedule: * * * * * |
+------------+-----------------------------------------------------------------+
| Container: | tf-buster-std |
+------------+-----------------------------------------------------------------+
| Status: | Last schedule time: 2021-06-30T10:26:00Z |
+------------+-----------------------------------------------------------------+
| Hints: | Last run at 2021-06-30T10:26:08Z. Pod in 'Pending' phase. State |
| | 'waiting' for reason 'ContainerCreating'. |
+------------+-----------------------------------------------------------------+
$ toolforge-jobs delete myscheduledjob
$ toolforge-jobs flush
$ toolforge-jobs list
[.. nothing ..]
|
The above content has been placed in a collapsed box for improved usability. |
Grid Engine migration
![]() | The toolforge-jobs command line interface can only be used from Debian Buster bastions: login-buster.toolforge.org and dev-buster.toolforge.org |
This section contains specific documentation for Grid Engine users that are trying to migrate their jobs to Kubernetes.
In particular, here is a list of common command equivalences between Grid Engine (legacy, with jsub
and friends) and Kubernetes (with the new toolforge-jobs
).
- Basic job submission:
tools.mytool@tools-sgebastion-11:~$ jsub ./mycommand.sh
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./mycommand.sh --image tf-bullseye-std
- Allocating additional memory:
tools.mytool@tools-sgebastion-11:~$ jsub -mem 1000m php i_like_more_ram.php
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./i_like_more_ram.php --image tf-php74 --mem 1Gi --cpu 2
- Waiting until the job is completed:
tools.mytool@tools-sgebastion-11:~$ jsub -sync y program [args...]
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run myjob --command ./myScript.py --image tf-python39 --wait
- Viewing information about all jobs:
tools.mytool@tools-sgebastion-11:~$ qstat
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs list
- Deleting a job:
tools.mytool@tools-sgebastion-11:~$ qdel job_number/job_name
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs delete myjob
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs flush
Useful links
The following tools have been built by the Toolforge admin team to help others see job status:
- k8s-status.toolforge.org — status board of Kubernetes nodes and tools (webservices, jobs) they are currently running.
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia Movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect, the bridged Telegram group, or the bridged Mattermost channel
- Discuss via email after you subscribed to the cloud@ mailing list
See also
- Help:Toolforge/Web
- Help:Toolforge/Kubernetes
- News/Toolforge Stretch deprecation
- News/2020 Kubernetes cluster migration
- Alternate procedure for managing jobs in Toolforge Kubernetes, using the raw k8s API, only recommended if you are an advanced user.
- Portal:Toolforge/Admin/Kubernetes/jobs - Engineering documentation about this system.
- Wikimedia Techblog: Toolforge Jobs Framework