You are browsing a read-only backup copy of Wikitech. The live site can be found at

Performance/Runbook/Webperf-processor services

From Wikitech-static
< Performance‎ | Runbook(Redirected from Webperf1001)
Jump to navigation Jump to search

This is the run book for deploying and monitoring webperf-processor services.


The puppet role for these services is role::webperf:processors_and_site.

Find the current production hosts for this role in puppet: site.pp. Find the current beta host at openstack-browser: deployment-prep.

Hosts as of Jan 2022 (T305460):


The navtiming service (written in Python) extracts information for the NavigationTiming and SaveTiming schemas from EventLogging using Kafka. It submits them to Graphite via Statsd. The EventLogging data comes a JS plugin for MediaWiki (beacon js source, MediaWiki extension).


Monitor navtiming

Application logs for this service are not sent to Logstash currently.

  • Ssh to the host you want to monitor.
  • Run sudo journalctl -u navtiming -f -n100

Deploy navtiming

This service runs on the webperf*1 hosts.

To update the service on the Beta Cluster:

  1. Connect with ssh
  2. run sudo journalctl -u navtiming -f -n100 and keep this open during the following steps
  3. in a new tab, connect with ssh (or whatever the current deployment-deploy* host is, check).
  4. cd /srv/deployment/performance/navtiming
  5. git pull
  6. scap deploy
  7. Review the scap output (here) and the journalctl output (on the webperf server) for any errors.

To deploy a change in production:

  1. Before you start, open a terminal window in which you monitor the service on a host in the currently main data center. For example, if Eqiad is primary, ssh to webperf10##.eqiad.wmnet and run sudo journalctl -u navtiming -f -n100.
  2. In another terminal window, ssh to deployment.eqiad.wmnet and navigate to /srv/deployment/performance/navtiming.
  3. Prepare the working copy:
    • Ensure the working copy is clean, git status.
    • Fetch the latest changes from Gerrit remote, git fetch origin.
    • Review the changes, git log -p HEAD..@{u}.
    • Apply the changes to the working copy, git rebase.
  4. Deploy the changes, this will automatically restarts the service afterward.
    • Run scap deploy

Restart navtiming

sudo systemctl restart navtiming


Written in Python.

Application logs are kept locally, and can be read via sudo journalctl -u coal.

Reprocessing past periods

Coal data for an already processed period can be overwritten safely. To backfill a period after an outage, run coal manually on one of the perf hosts (no need to stop the existing process), using a different consumer group, and use the --start-timestamp option (careful about the timestamp being expressed in milliseconds since Epoch). Once you see that the outage gap has been filled, you can safely stop that manual coal process.

Restart coal

sudo systemctl restart coal


The statsv service (written in Python) forwards data from the Kafka stream for /beacon/statsv web requests to Statsd.

Application logs are kept locally, and can be read via sudo journalctl -u statsv.

Restart statsv

sudo systemctl restart statsv


Written in Python.


This powers the site at Beta Cluster instance at

Deploy the site

  • Follow instructions in the README to create a commit.
  • Push to Gerrit for review.
  • Once merged, Puppet will update the web servers within 30min.