You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Portal:Toolforge/Admin/Webservicemonitor

From Wikitech-static
Jump to navigation Jump to search

This page contains information on the webservicemonitor functionality in Toolforge.

The webservicemonitor component is a daemon which scans Toolforge's tool manifests looking for grid-based webservices, check they are alive and re-start them if required.

Since the Stretch version of Toolforge, this component is meant to run in cronrunner nodes. Previously it was running in services nodes.

Code

The source code (python3) is currently deployed as a Debian package, named tools-manifest, and can be found at https://gerrit.wikimedia.org/r/#/admin/projects/operations/software/tools-manifest.

All the setup is done using puppet, in the profile::toolforge::grid::webservicemonitor profile: modules/profile/manifests/toolforge/grid/webservicemonitor.pp

How it works

There is a daemon collector-runner which reads all Toolforge manifests from the NFS share. The manifests should indicate the tool is meant to run on the grid, using a web node:

tools.wdcat@tools-bastion-03:~$ cat service.manifest 
backend: gridengine
version: 2
web: uwsgi-python

Then, the daemon will check if there is a job for this tool running. If not, will restart it and procude a log entry in the tool log. To be able to check the tool status and restart it, the daemon interacts with the grid. The server running the daemon should be a grid submit host.