You are browsing a read-only backup copy of Wikitech. The primary site can be found at


From Wikitech-static
Revision as of 17:48, 31 March 2022 by imported>Krinkle
Jump to navigation Jump to search

requestctl is a command-line to control access and routing of web requests, currently used for throttling and blocking of certain requests patterns in Varnish frontend, in our edge caching.

This page shows how the tool is used in production at the WMF. For more details about its schema and its general command line options, see requestctl's README file.

In production, we keep all requestctl objects under the requestctl directory of the private Puppet repository.

Quick start: adding a new rule

Let's say we want to throttle per ip requests that don't have an accept-encoding header, have Connect: keep-alive as a header and go to a special page, coming from azure.

We already have the ipblocks from azure, originating from a cronjob running on the puppetmasters, in the file requestctl/request-ipblocks/cloud/azure.yaml:

:~$ requestctl get ipblock -o json | jq -r 'keys[]'

Now let's check if we have a request pattern that correspond to not having an accept-encoding header:

:~$ requestctl get pattern
name                pattern
------------------  --------------------------------
req/cache_buster_q  ?q=\w{12}
ua/urllib3          User-Agent: ^python-urllib3/.*$
ua/requests         User-Agent: ^python-requests/.*$
ua/curl             User-Agent: ^curl/.*$
ua/MediaWiki        User-Agent: ^MediaWiki/.*$
sites/commonswiki   Host:
sites/wikidata      Host:
sites/enwiki        Host:
url/api             url:^/w/(api|rest).php
url/docroot         url:^/[?$]
url/page            url:^/wiki/
url/semicolon_page  url:^/wiki/.+:+

It doesn't look like it's the case! So let's add a file named /srv/private/requestctl/request-patterns/req/no_accept_encoding.yaml, with the following content:

header: 'Accept-Encoding'

Omitting any header_value this will translate to "no header present" (see the README, again).

Now let's sync our objects to etcd:

puppetmaster1001:~$ sudo requestctl sync -g /srv/private/requestctl pattern
2022-03-28 14:56:23,995 - reqctl (cli:_write:359) - INFO - Updating pattern ua/MediaWiki
2022-03-28 14:56:24,005 - reqctl (cli:_write:359) - INFO - Updating pattern ua/curl
2022-03-28 14:56:24,014 - reqctl (cli:_write:359) - INFO - Updating pattern ua/urllib3
2022-03-28 14:56:24,024 - reqctl (cli:_write:359) - INFO - Updating pattern ua/requests
2022-03-28 14:56:24,034 - reqctl (cli:_write:359) - INFO - Updating pattern sites/wikidata
2022-03-28 14:56:24,044 - reqctl (cli:_write:359) - INFO - Updating pattern sites/commonswiki
2022-03-28 14:56:24,054 - reqctl (cli:_write:359) - INFO - Updating pattern sites/enwiki
2022-03-28 14:56:24,064 - reqctl (cli:_write:359) - INFO - Updating pattern req/cache_buster
2022-03-28 14:56:24,073 - reqctl (cli:_write:359) - INFO - Updating pattern req/specific_page
2022-03-28 14:56:24,085 - reqctl (cli:_write:359) - INFO - Updating pattern req/cache_buster_q
2022-03-28 14:56:24,094 - reqctl (cli:_write:362) - INFO - Creating pattern req/no_accept_encoding
2022-03-28 14:56:24,103 - reqctl (cli:_write:359) - INFO - Updating pattern url/semicolon_page
2022-03-28 14:56:24,113 - reqctl (cli:_write:359) - INFO - Updating pattern url/api
2022-03-28 14:56:24,122 - reqctl (cli:_write:359) - INFO - Updating pattern url/docroot
2022-03-28 14:56:24,133 - reqctl (cli:_write:359) - INFO - Updating pattern url/page

(note that our object hase been created).

Now we can do the same with Connect: keep-alive, we'll create the file /srv/private/requestctl/request-patterns/req/keepalive.yaml containing:

header: Connect
header_value: keep-alive

and sync again with the same command.

Now we have all the ingredients, and we can move to write the action at /srv/private/requestctl/request-actions/cache-text/bot_from_azure.yaml

# This should tell anyone what this rule does
comment: "Throttle requests with keepalive but no accept-encoding, coming from azure."
# This is the default. For now add it.
enabled: false
# each pattern and ipblock is referred to using {pattern,ipblock}@<scope>/<name>
expression: pattern@req/keepalive AND pattern@req/no_accept_encoding AND ipblock@cloud/azure
# Only bother with cache misses
cache_miss_only: true
# We want to throttle individual ips
do_throttle: true
throttle_per_ip: true
# Allow 10 rqp per 10 seconds, and if exceeeded, ban for 1 minute
throttle_requests: 100
throttle_interval: 10
throttle_duration: 60

now we can just run requestctl sync -g /srv/private/requestctl actionAnd the object will be in the datastore:

:~$ sudo requestctl get action cache-text/bot_from_azure -o yaml
  cache_miss_only: true
  comment: Throttle requests with keepalive but no accept-encoding, coming from azure
  do_throttle: true
  enabled: false
  expression: pattern@req/keepalive AND pattern@req/no_accept_encoding AND ipblock@cloud/azure
  resp_reason: ''
  resp_status: 429
  sites: []
  throttle_duration: 60
  throttle_interval: 10
  throttle_per_ip: true
  throttle_requests: 100

Now this won't show up now in varnish, but you still have a way to check the rule you created!

On the puppetmasters, we generate "test" files under /var/lib/requestctl/test that contain all the rules, independently of their enabled state or which datacenters they're enabled in:

puppetmaster2001:~$ cat /var/lib/requestctl/tests/
// Actions generated from etcd rules, if any.
// Edit using confctl. See instructions for disabling them below.
// FILTER: bot_from_azure
// To enable, run:
// sudo requestctl enable 'cache-text/bot_from_azure'

if ( req.http.Connect ~ "keep-alive" &&  !req.http.Accept-Encoding && req.http.X-Public-Cloud == "azure"  && vsthrottle.is_denied("global:bot_from_azure:"  + req.http.X-Client-IP, 100, 10s, 60s)) {
        return (synth(429, ""));

This allows you to do a first check of the rule you would create. If you want to add an additional layer of security to your rollout, you can craft a VSL expression to match the same condition in logs of a cache serverusing varnishlog.

To actually get the rule injected into the varnish configuration, we would need to run, as specified in the comment:

puppetmaster1001:~$ sudo requestctl enable cache-text/bot_from_azure

At this point, the rule will appear on all cache-text nodes. That is because we didn't define the sites property for our new rule.

Commands recap

List existing actions

# All actions.
requestctl get action -o yaml
# A specific action
requestctl get action cache-text/generic_ua_clouds
# All enabled actions
requestctl get action -o json | jq 'to_entries[] | select(.value.enabled == true)'

Enable / Disable an action

# Writes to the datastore, needs sudo
sudo requestctl enable cache-text/generic_ua_clouds
sudo requestctl disable cache-text/generic_ua_clouds

List existing patterns

# All patterns
requestctl get pattern
# Request a specific pattern
requestctl get pattern ua/requests

Modifying any object

  • Ssh to a puppetmaster frontend
  • modify the yaml file under /srv/private/requestctl, commit the change
  • Run requestctl sync -g /srv/private/requestctl {pattern,ipblock/action}

Removing an object

  • If you're removing a pattern / ipblock, ensure it's not referenced by any action object
  • remove the object file from the git repository, commit the change
  • Run requestctl sync --purge -g /srv/private/requestctl {pattern,ipblock/action}

A more detailed explanation of how everything works

When you run requestctl, you typically modify data that resides in our main Etcd cluster. Specifically the keys you'll modify are under /conftool/v1/request-{ipblock,action,pattern}s/ .

On all cache proxy servers, a Confd instance is watching these keyspaces and generates the following files:

  • /var/netmapper/public_clouds.json from the data at /conftool/v1/request-ipblocks/cloud/, using this template. This netmap is then used to add an X-Public-Cloud: <name> header to requests coming from any IP address in those ranges. This data is updated daily using a script that runs on the puppetmasters.
  • /etc/varnish/ from the data at /conftool/v1/request-ipblocks/abuse/, using another template. This generates a list of varnish acls that can be referenced later in the VCL. This data is currently duplicated from the private puppet hiera, and will need to be kept in sync somehow.
  • /etc/varnish/ from the data at /conftool/v1/request-actions/cache-$CLUSTER, under the condition that these entries have enabled: true and either specify no sites, or the current datacenter is included in the list. There is a slightly more complex template to translate what is written in the expression field of the action to VCL code. This code gets injected directly in the cluster_fe_ratelimit VCL subroutine, so it only applies to cache misses at the moment.

To add/remove/modify entries, you are expected to edit the files in the puppet private repository and run requestctl sync in the following order:

  • sudo requestctl sync -g /srv/private/requestctl pattern
  • sudo requestctl sync -g /srv/private/requestctl ipblock
  • sudo requestctl sync -g /srv/private/requestctl action