You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
< Portal:Toolforge‎ | Admin
Revision as of 18:38, 17 April 2019 by imported>Bstorm (→‎/grid/continuous/stretch: add doc)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Toolschecker is a Flask application that runs various active checks on Toolforge and Cloud VPS infrastructure in response to HTTP requests. Each check is exposed as a separate URL on the host. These URLs are monitored by Icinga for alerting purposes (see ""). Some URLs are also monitored externally by Catchpoint.



This list is defined in the toollabs::checker_hosts key in and is used in configuring the ferm rules for Toolforge's flannel and Kubernetes etcd clusters.


Several tools are involved in the checks:

Crontab for /cron check
Webservice for /webservice/gridengine check
Webservice for /webservice/kubernetes check



Expects the mtime of /data/project/toolschecker/crontest.txt to be updated every 5 minutes by a grid job executed by the toolschecker tool.


  • ssh
  • become toolschecker
  • crontab -l
*/5 * * * * /usr/bin/jsub -N toolschecker.crontest -once -quiet touch /data/project/toolschecker/crontest.txt







There is a small script in /data/project/toolschecker/bin/ that runs as a job that runs forever. If it stops running, this checker will go critical. To prevent that there is a cron job definition of:

*/5 * * * * jlocal /data/project/toolschecker/ test-long-running-stretch /data/project/toolschecker/bin/

The script checks for the job and restarts it if not found.