You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Requestctl

From Wikitech-static
Revision as of 17:09, 28 March 2022 by imported>Giuseppe Lavagetto (Created page with "{{TOC}} == What is requestctl == Requestctl is the tool we use to control request patterns at various layers of the infrastructure, currently mainly at the edge caches. In this page, we'll show how the tool is used at the WMF. For more details about its schema and its general command line options, see requestctl's [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/conftool/+/refs/heads/master/conftool/extensions/reqconfig/README.md README file]. In pro...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What is requestctl

Requestctl is the tool we use to control request patterns at various layers of the infrastructure, currently mainly at the edge caches.

In this page, we'll show how the tool is used at the WMF. For more details about its schema and its general command line options, see requestctl's README file.

In production, we keep all requestctl objects under the requestctl directory of the private puppet repository.

Quick start: adding a new rule

Let's say we want to throttle per ip requests that don't have an accept-encoding header, have Connect: keep-alive as a header and go to a special page, coming from azure.

We already have the ipblocks from azure, originating from a cronjob running on the puppetmasters, in the file requestctl/request-ipblocks/cloud/azure.yaml:

:~$ requestctl get ipblock -o json | jq -r 'keys[]'
abuse/blocked_nets
abuse/bot_blocked_nets
abuse/bot_posts_blocked_nets
abuse/phabricator_abusers
abuse/text_abuse_nets
cloud/aws
cloud/azure
cloud/digitalocean
cloud/gcp
cloud/oci
cloud/public_cloud_nets

Now let's check if we have a request pattern that correspond to not having an accept-encoding header:

:~$ requestctl get pattern
name                pattern
------------------  --------------------------------
req/cache_buster
req/cache_buster_q  ?q=\w{12}
req/specific_page
ua/urllib3          User-Agent: ^python-urllib3/.*$
ua/requests         User-Agent: ^python-requests/.*$
ua/curl             User-Agent: ^curl/.*$
ua/MediaWiki        User-Agent: ^MediaWiki/.*$
sites/commonswiki   Host: commons.wikimedia.org
sites/wikidata      Host: www.wikidata.org
sites/enwiki        Host: en.wikipedia.org
url/api             url:^/w/(api|rest).php
url/docroot         url:^/[?$]
url/page            url:^/wiki/
url/semicolon_page  url:^/wiki/.+:+

It doesn't look like it's the case! So let's add a file named /srv/private/requestctl/request-patterns/req/no_accept_encoding.yaml, with the following content:

header: 'Accept-Encoding'

Omitting any header_value this will translate to "no header present" (see the README, again).

Now let's sync our objects to etcd:

puppetmaster1001:~$ sudo requestctl sync -g /srv/private/requestctl pattern
2022-03-28 14:56:23,995 - reqctl (cli:_write:359) - INFO - Updating pattern ua/MediaWiki
2022-03-28 14:56:24,005 - reqctl (cli:_write:359) - INFO - Updating pattern ua/curl
2022-03-28 14:56:24,014 - reqctl (cli:_write:359) - INFO - Updating pattern ua/urllib3
2022-03-28 14:56:24,024 - reqctl (cli:_write:359) - INFO - Updating pattern ua/requests
2022-03-28 14:56:24,034 - reqctl (cli:_write:359) - INFO - Updating pattern sites/wikidata
2022-03-28 14:56:24,044 - reqctl (cli:_write:359) - INFO - Updating pattern sites/commonswiki
2022-03-28 14:56:24,054 - reqctl (cli:_write:359) - INFO - Updating pattern sites/enwiki
2022-03-28 14:56:24,064 - reqctl (cli:_write:359) - INFO - Updating pattern req/cache_buster
2022-03-28 14:56:24,073 - reqctl (cli:_write:359) - INFO - Updating pattern req/specific_page
2022-03-28 14:56:24,085 - reqctl (cli:_write:359) - INFO - Updating pattern req/cache_buster_q
2022-03-28 14:56:24,094 - reqctl (cli:_write:362) - INFO - Creating pattern req/no_accept_encoding
2022-03-28 14:56:24,103 - reqctl (cli:_write:359) - INFO - Updating pattern url/semicolon_page
2022-03-28 14:56:24,113 - reqctl (cli:_write:359) - INFO - Updating pattern url/api
2022-03-28 14:56:24,122 - reqctl (cli:_write:359) - INFO - Updating pattern url/docroot
2022-03-28 14:56:24,133 - reqctl (cli:_write:359) - INFO - Updating pattern url/page

(note that our object hase been created).

Now we can do the same with Connect: keep-alive, we'll create the file /srv/private/requestctl/request-patterns/req/keepalive.yaml containing:

header: Connect
header_value: keep-alive

and sync again with the same command.

Now we have all the ingredients, and we can move to write the action at /srv/private/requestctl/request-actions/cache-text/bot_from_azure.yaml

# This should tell anyone what this rule does
comment: "Throttle requests with keepalive but no accept-encoding, coming from azure."
# This is the default. For now add it.
enabled: false
# each pattern and ipblock is referred to using {pattern,ipblock}@<scope>/<name>
expression: pattern@req/keepalive AND pattern@req/no_accept_encoding AND ipblock@cloud/azure
# Only bother with cache misses
cache_miss_only: true
# We want to throttle individual ips
do_throttle: true
throttle_per_ip: true
# Allow 10 rqp per 10 seconds, and if exceeeded, ban for 1 minute
throttle_requests: 100
throttle_interval: 10
throttle_duration: 60

now we can just run requestctl sync -g /srv/private/requestctl action And the object will be in the datastore:

:~$ sudo requestctl get action cache-text/bot_from_azure -o yaml
cache-text/bot_from_azure:
  cache_miss_only: true
  comment: Throttle requests with keepalive but no accept-encoding, coming from azure
  do_throttle: true
  enabled: false
  expression: pattern@req/keepalive AND pattern@req/no_accept_encoding AND ipblock@cloud/azure
  resp_reason: ''
  resp_status: 429
  sites: []
  throttle_duration: 60
  throttle_interval: 10
  throttle_per_ip: true
  throttle_requests: 100

Now this won't show up now in varnish. To make that happen, we would need to run

puppetmaster1001:~$ sudo requestctl enable cache-text/bot_from_azure

At this point, the rule will appear on all cache-text nodes. That is because we didn't define the sites property for our new rule.

Commands recap

List existing actions

# All actions.
requestctl get action -o yaml
# A specific action
requestctl get action cache-text/generic_ua_clouds

Enable / Disable an action

# Writes to the datastore, needs sudo
sudo requestctl enable cache-text/generic_ua_clouds
sudo requestctl disable cache-text/generic_ua_clouds

List existing patterns

# All patterns
requestctl get pattern
# Request a specific pattern
requestctl get pattern ua/requests

Modifying any object

  • Ssh to a puppetmaster frontend
  • modify the yaml file under /srv/private/requestctl, commit the change
  • Run requestctl sync -g /srv/private/requestctl {pattern,ipblock/action}

Removing an object

  • If you're removing a pattern / ipblock, ensure it's not referenced by any action object
  • remove the object file from the git repository, commit the change
  • Run requestctl sync --purge -g /srv/private/requestctl {pattern,ipblock/action}