You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Apache Traffic Server: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ema
imported>Ema
(22 intermediate revisions by 8 users not shown)
Line 1: Line 1:
[https://trafficserver.apache.org/ Apache Traffic Server] is a caching proxy server.
{{Navigation Wikimedia infrastructure|expand=caching}}
'''Apache Traffic Server''', aka '''ATS''', is a caching HTTP proxy used as the backend (on-disk) component of Wikimedia's CDN. In-memory, ephemeral caching is done by cache frontends running [[Varnish]].


== Processes ==
==Architecture==
There are three distinct processes in Traffic Server:
There are three distinct processes in Traffic Server:
# traffic_server
# traffic_manager
# traffic_cop


'''traffic_server''' is the process responsible for dealing with user traffic: accepting connections, processing requests, serving documents from cache or the origin server.
#traffic_server
#traffic_manager
#traffic_cop
 
'''traffic_server''' is the process responsible for dealing with user traffic: accepting connections, processing requests, serving documents from cache or the origin server. traffic_server is a event-driven multi-threaded process. Threads are used to take advantage of multiple CPUs, not to handle multiple connections concurrently (eg: by spawning a thread per connection, or by using a thread pool). Instead, an [https://docs.trafficserver.apache.org/en/latest/developer-guide/plugins/introduction.en.html#asynchronous-event-model event system] is used in order to schedule work on threads. ATS uses a [https://docs.trafficserver.apache.org/en/latest/developer-guide/plugins/hooks-and-transactions/index.en.html#http-transaction-state-diagram state machine] (compare with [https://book.varnish-software.com/4.0/_images/simplified_fsm.svg the Varnish FSM]) to handle each transaction (single HTTP request from a client and the response Traffic Server sends to that client) and provides a system of [https://docs.trafficserver.apache.org/en/latest/developer-guide/plugins/hooks-and-transactions/index.en.html hooks] where plugins (eg: lua) can step in and do things. Specific [https://docs.trafficserver.apache.org/en/latest/developer-guide/plugins/hooks-and-transactions/trafficserver-timers.en.html timers] are used at the various states.


'''traffic_manager''' is responsible for launching, monitoring and configuring '''traffic_server''', handling the statistics interface, cluster administration and virtual IP failover.
'''traffic_manager''' is responsible for launching, monitoring and configuring '''traffic_server''', handling the statistics interface, cluster administration and virtual IP failover.


'''traffic_cop''' is a watchdog program monitoring the health of both '''traffic_manager''' and '''traffic_server'''. This has traditionally been the command to use in order to start ATS. In a systemd world, it can be avoided, and traffic_manager can be used as the program to be executed in order to start the unit.
'''traffic_cop''' is a watchdog program monitoring the health of both '''traffic_manager''' and '''traffic_server'''. This has traditionally been the command to use in order to start ATS. In a systemd world, it can probably be avoided, and traffic_manager can be used as the program to be executed in order to start the unit.
 
==Terminology==
ATS uses the term '''transaction''' with a different meaning depending on the protocol. For HTTP, a transaction is a HTTP request. In the case of HTTP2, a transaction is a HTTP2 stream. When the term '''connection''' is used, that means TCP connections regardless of the context. See the relevant [https://docs.trafficserver.apache.org/en/8.1.x/developer-guide/client-session-architecture.en.html ATS documentation about Client Sessions and Transactions].


== Basic configuration ==
==Configuration==
The basic changes to the default configuration required to get a caching proxy are:
The changes to the default configuration required to get a caching proxy are:


<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/remap.config
# /etc/trafficserver/remap.config
map client_url origin_server_url
map client_url origin_server_url
</source>
</syntaxhighlight>


The following rules map grafana and phabricator to their respective backends and define a catchall for requests that don't match either of the first two rules:
The following rules map grafana and phabricator to their respective backends and define a catchall for requests that don't match either of the first two rules:
<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/remap.config
# /etc/trafficserver/remap.config
map http://grafana.wikimedia.org/ http://krypton.eqiad.wmnet/
map http://grafana.wikimedia.org/ http://krypton.eqiad.wmnet/
map http://phabricator.wikimedia.org/ http://iridium.eqiad.wmnet/
map http://phabricator.wikimedia.org/ http://iridium.eqiad.wmnet/
map / http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/
map / http://deployment-mediawiki05.deployment-prep.eqiad1.wikimedia.cloud/
</source>
</syntaxhighlight>


<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/records.config
# /etc/trafficserver/records.config


CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6
CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6
CONFIG proxy.config.admin.synthetic_port INT 8083
CONFIG proxy.config.process_manager.mgmt_port INT 8084


CONFIG proxy.config.admin.user_id STRING trafficserver
CONFIG proxy.config.admin.user_id STRING trafficserver
Line 40: Line 43:
CONFIG proxy.config.url_remap.pristine_host_hdr INT 1
CONFIG proxy.config.url_remap.pristine_host_hdr INT 1
CONFIG proxy.config.disable_configuration_modification INT 1
CONFIG proxy.config.disable_configuration_modification INT 1
</source>
</syntaxhighlight>


If [https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-cache-required-headers proxy.config.http.cache.required_headers] is set to 2, which is the default, the origin server is required to set an explicit lifetime, from either '''Expires''' or '''Cache-Control: max-age'''. By setting '''required_headers''' to 1, objects with '''Last-Modified''' are considered for caching too. Setting the value to 0 means that no headers are required to make documents cachable.
If [https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-cache-required-headers proxy.config.http.cache.required_headers] is set to 2, which is the default, the origin server is required to set an explicit lifetime, from either '''Expires''' or '''Cache-Control: max-age'''. By setting '''required_headers''' to 1, objects with '''Last-Modified''' are considered for caching too. Setting the value to 0 means that no headers are required to make documents cachable.


=== Health checks ===
===TLS===
basic TLS termination can be configured with the following configuration:<syntaxhighlight lang="bash">
# /etc/trafficserver/records.config
CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6 3129:ssl 3129:ipv6:ssl
CONFIG proxy.config.ssl.server.cert.path STRING /etc/acmecerts/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/acmecerts/
</syntaxhighlight><syntaxhighlight lang="bash">
# /etc/trafficserver/ssl_multicert.config
dest_ip=* ssl_cert_name=rsa.crt,ecdsa.crt ssl_key_name=rsa.key,ecdsa.key
</syntaxhighlight>
 
===Load balancing===
In order to load balance requests among origin servers, parent_proxy_routing needs to be enabled in records.config:
 
<syntaxhighlight lang="bash">
# records.config
CONFIG proxy.config.http.parent_proxy_routing_enable INT 1
CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING parent_select
</syntaxhighlight>
 
A remap rule needs to be configured for the site:
<syntaxhighlight lang="bash">
# remap.config
map http://en.wikipedia.org https://enwiki.org
</syntaxhighlight>
 
Finally load balancing can be configured by specifying the nodes and the load balancing policy in parent.config:
<syntaxhighlight lang="bash">
# parent.config
dest_domain=enwiki.org parent="mw1261.eqiad.wmnet:443,mw1262.eqiad.wmnet:443" parent_is_proxy=false round_robin=strict
</syntaxhighlight>
 
===Logging===
Diagnostic output [https://docs.trafficserver.apache.org/en/7.1.x/admin-guide/files/records.config.en.html#diagnostic-logging-configuration can be sent to standard output and error instead of the default logfiles], which is a good idea in order to take advantage of systemd's journal.
 
<syntaxhighlight lang="bash">
# /etc/trafficserver/records.config
CONFIG proxy.config.diags.output.status STRING O
CONFIG proxy.config.diags.output.note STRING O
CONFIG proxy.config.diags.output.warning STRING O
CONFIG proxy.config.diags.output.error STRING E
CONFIG proxy.config.diags.output.fatal STRING E
CONFIG proxy.config.diags.output.alert STRING E
CONFIG proxy.config.diags.output.emergency STRING E
</syntaxhighlight>
 
===Health checks===
Load the `healthchecks` plugin:  
Load the `healthchecks` plugin:  


<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/plugin.config
# /etc/trafficserver/plugin.config
healthchecks.so /etc/trafficserver/healtchecks.conf
healthchecks.so /etc/trafficserver/healtchecks.conf
</source>
</syntaxhighlight>


Define health check:
Define health check:
<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/healtchecks.conf
# /etc/trafficserver/healtchecks.conf
/check /etc/trafficserver/ts-alive text/plain 200 403
/check /etc/trafficserver/ts-alive text/plain 200 403
</source>
</syntaxhighlight>


Response body:
Response body:
<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/ts-alive
# /etc/trafficserver/ts-alive
All good
All good
</source>
</syntaxhighlight>


With the above configuration, GET requests to `/check` will result in 200 responses from ATS with the response body defined in `/etc/trafficserver/ts-alive`.
With the above configuration, GET requests to `/check` will result in 200 responses from ATS with the response body defined in `/etc/trafficserver/ts-alive`.


=== systemd unit ===
===Cache inspector===
<source lang="bash">
To enable the cache inspector functionality, add the following remap rules:
# /etc/systemd/system/trafficserver.service
[Unit]
Description=Apache Traffic Server
After=network.service systemd-networkd.service network-online.target


<syntaxhighlight lang="bash">
map /cache-internal/ http://{cache-internal}
map /cache/ http://{cache}
map /stat/ http://{stat}
map /test/ http://{test}
map /hostdb/ http://{hostdb}
map /net/ http://{net}
map /http/ http://{http}
</syntaxhighlight>
===systemd unit===
<syntaxhighlight lang="bash">
# /etc/systemd/system/trafficserver.service.d/puppet-override.conf
[Service]
[Service]
ExecStart=
ExecStart=/usr/bin/traffic_manager --nosyslog
ExecStart=/usr/bin/traffic_manager --nosyslog
ExecReload=/usr/bin/traffic_ctl config reload
Restart=always
Restart=always
RestartSec=1
RestartSec=1
ExecReload=
# XXX: `traffic_server -C verify_config` is broken: it causes configuration
# reloads, which cause errors with ascii_pipe logs
#ExecReload=/usr/bin/traffic_server -C verify_config
ExecReload=/usr/bin/traffic_ctl config reload
# traffic_manager is terminated with SIGTERM and exits with the received signal
# number (15)
SuccessExitStatus=15


LimitNOFILE=500000
LimitNOFILE=500000
LimitMEMLOCK=90000
LimitMEMLOCK=90000


# PrivateTmp causes the following error:
# Security options
# FATAL: unable to load remap.config
ProtectKernelModules=yes
# traffic_server: using root directory '/usr'
ProtectKernelTunables=yes
#PrivateTmp=yes
PrivateTmp=yes
 
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK
 
CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_SETGID CAP_SETUID
SystemCallFilter=~@keyring @clock @cpu-emulation @obsolete @module @raw-io @debug
 
# The entire file system hierarchy is mounted read-only, except for the API
# file system subtrees /dev, /proc and /sys
ProtectSystem=strict
 
# Whitelist read/write directories
ReadWritePaths=/var/log/trafficserver
ReadWritePaths=/var/run/trafficserver
ReadWritePaths=/var/cache/trafficserver
</syntaxhighlight>
 
===Additional ATS instances===
Traffic server provides a poorly documented feature called layouts. The ATS layout defines the following paths:
 
*exec_prefix (TS_BUILD_EXEC_PREFIX)
*bindir (TS_BUILD_BINDIR)
*sbindir (TS_BUILD_SBINDIR)
*sysconfdir (TS_BUILD_SYSCONFDIR)
*datadir (TS_BUILD_DATADIR)
*includedir (TS_BUILD_INCLUDEDIR)
*libdir (TS_BUILD_LIBDIR)
*libexecdir (TS_BUILD_LIBEXECDIR)
*localstatedir (TS_BUILD_LOCALSTATEDIR)
*runtimedir (TS_BUILD_RUNTIMEDIR)
*logdir (TS_BUILD_LOGDIR)
*mandir (TS_BUILD_MANDIR)
*infodir (TS_BUILD_INFODIR
*cachedir (TS_BUILD_CACHEDIR)
 
Those paths are defined at building time by their corresponding TS_BUILD_ constants. However those can be replaced in runtime by using a layout/runroot file. A layout file is a YAML file that defines the paths listed above and it has the following syntax:<syntaxhighlight lang="yaml">
prefix: ./runroot
exec_prefix: ./runroot
bindir: ./runroot/custom_bin
sbindir: ./runroot/custom_sbin
sysconfdir: ./runroot/custom_sysconf
datadir: ./runroot/custom_data
includedir: ./runroot/custom_include
libdir: ./runroot/custom_lib
libexecdir: ./runroot/custom_libexec
localstatedir: ./runroot/custom_localstate
runtimedir: ./runroot/custom_runtime
logdir: ./runroot/custom_log
cachedir: ./runroot/custom_cache
</syntaxhighlight>After defining the layout file, the runroot can be initialized by running traffic_layout:<syntaxhighlight lang="bash">
$ traffic_layout --init --layout="custom.yml" --copy-style=soft
</syntaxhighlight>Take into account that the custom layout defines its own bin and sbin directories, so it needs to copy the binaries inside the runroot. Fortunately the flag --copy-style allows to control how the executables are being copied:
 
*copy: Full copy
*hard: Use hard links
*soft: Use symlinks
 
Our goal here is to run several instances of the same ATS version, so --copy-style=soft allows to do that and still benefit from system-wide ATS upgrades.
 
After the layout has been initialized any traffic server CLI tool can use it by adding the option --run-root or setting the TS_RUNROOT environment variable:<syntaxhighlight lang="bash">
$ traffic_ctl --run-root="custom.yaml" reload
$ TS_RUNROOT="custom.yaml" traffic_ctl reload
</syntaxhighlight>
 
==Debugging==
 
The [https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/xdebug.en.html XDebug plugin] allows clients to check various aspects of ATS operation.
 
To enable the plugin, add '''xdebug.so''' to '''plugin.config''', add the following lines to '''records.config''', and restart trafficserver.
 
<syntaxhighlight lang="text">
CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING xdebugs.tag
</syntaxhighlight>
 
Once the plugin is enabled, clients can specify various values in the '''X-Debug''' header and receive the relevant information back.
 
For example:
 
<syntaxhighlight lang="bash">
# cache hit
$ curl -H "X-Debug: X-Milestones" http://localhost 2>&1 | grep Milestones:
< X-Milestones: PLUGIN-TOTAL=0.000022445, PLUGIN-ACTIVE=0.000022445, CACHE-OPEN-READ-END=0.000078570, CACHE-OPEN-READ-BEGIN=0.000078570, UA-BEGIN-WRITE=0.000199094, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000
 
# cache miss
< X-Milestones: PLUGIN-TOTAL=0.000017432, PLUGIN-ACTIVE=0.000017432, DNS-LOOKUP-END=0.091413811, DNS-LOOKUP-BEGIN=0.000148548, CACHE-OPEN-WRITE-END=0.091413811, CACHE-OPEN-WRITE-BEGIN=0.091413811, CACHE-OPEN-READ-END=0.000056997, CACHE-OPEN-READ-BEGIN=0.000056997, SERVER-READ-HEADER-DONE=0.218755336, SERVER-FIRST-READ=0.218755336, SERVER-BEGIN-WRITE=0.091413811, SERVER-CONNECT-END=0.091413811, SERVER-CONNECT=0.091413811, SERVER-FIRST-CONNECT=0.091413811, UA-BEGIN-WRITE=0.218755336, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000
</syntaxhighlight>
 
The full list of debugging headers is available in the [https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/xdebug.en.html#debugging-headers XDebug Plugin documentation].
 
In the setup at WMF, the plugin can be enabled by setting '''profile::trafficserver::backend::enable_xdebug''' to true in hiera. It can then be used by specifying the '''X-Debug-ATS''' request header. For example, to dump all client/intermediary/origin request/response headers:
 
<syntaxhighlight lang="bash">
$ curl -H "X-ATS-Debug: log-headers" http://localhost
</syntaxhighlight>
 
===Request logs===
Non-purge request logs can be inspected by running '''atslog-backend''', a wrapper around '''fifo-log-tailer''':
<syntaxhighlight lang="bash">
$ sudo atslog-backend
</syntaxhighlight>
 
Call '''fifo-log-tailer''' directly to inspect PURGE traffic:
<syntaxhighlight lang="bash">
# LOG_SOCKET=/var/run/trafficserver/notpurge.sock fifo-log-tailer
</syntaxhighlight>
 
===Testing patches in labs===
The '''traffic''' project in labs provides a testbed for ATS related patches, both in terms of Lua/configuration and ATS itself. One of the [https://horizon.wikimedia.org/auth/switch/traffic/?next=/project/instances/ traffic instances] with hostname beginning in '''traffic-cache-''' can be used for this purpose. The '''traffic''' labs project also features a self-hosted puppetmaster on which puppet patches can be cherry-picked. At the time of this writing the instance is '''traffic-puppetmaster-buster.traffic.eqiad1.wikimedia.cloud''', but double-check that this is still the case now that you are reading.
 
<syntaxhighlight lang="bash">
ema@traffic-cache-atstext-buster:~$ grep ^server /etc/puppet/puppet.conf
server = traffic-puppetmaster-buster.traffic.eqiad1.wikimedia.cloud
</syntaxhighlight>
 
The labs testbed provides a different, significantly simplified remap configuration compared to production. Right now it looks like this on cache_text instances:
 
<syntaxhighlight lang="bash">
ema@traffic-cache-atstext-buster:~$ sudo cat /etc/trafficserver/remap.config
# https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html
# This file is managed by Puppet.
 
map http://en.wikipedia.beta.wmflabs.org http://deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud
</syntaxhighlight>
 
Test requests against MediaWiki can be performed as follows
<syntaxhighlight lang="bash">
ema@traffic-cache-atstext-buster:~$ curl -v -H 'Host: en.wikipedia.beta.wmflabs.org' '127.0.0.1:3128/w/load.php?lang=it&modules=startup&only=scripts&raw=1&skin=vector'
</syntaxhighlight>
 
Use '''Host: upload.wikimedia.beta.wmflabs.org''' on cache_upload instances instead.
 
==Building and running from Git==
To build trafficserver [https://github.com/apache/trafficserver/ from git]:
 
<syntaxhighlight lang="bash">
autoreconf -if
./configure --enable-layout=Debian --sysconfdir=/etc/trafficserver --libdir=/usr/lib/trafficserver --libexecdir=/usr/lib/trafficserver/modules
make -j8
</syntaxhighlight>
 
Add a minimal '''/etc/trafficserver/records.config''':
 
<syntaxhighlight lang="bash">
CONFIG proxy.config.disable_configuration_modification INT 1
# Replace $PATH_TO_REPO!
CONFIG proxy.config.bin_path STRING ${PATH_TO_REPO}/trafficserver/src/traffic_server/
</syntaxhighlight>
 
The newly built '''traffic_server''' and '''traffic_manager''' binaries can be tested as follows:
<syntaxhighlight lang="bash">
sudo -u trafficserver ./src/traffic_server/traffic_server
sudo -u trafficserver ./src/traffic_manager/traffic_manager --nosyslog
</syntaxhighlight>
 
==Packaging==
To package a new stable release, download it from https://trafficserver.apache.org/downloads and check its SHA.
 
Then import it into '''operations/debs/trafficserver''' with:
 
<syntaxhighlight lang="bash">
PRISTINE_ALL_XDELTA=xdelta gbp import-orig --pristine-tar /tmp/trafficserver-8.0.2.tar.bz2
</syntaxhighlight>
 
This will upgrade the following branches, don't forget to push all of them to repository:
 
* master
* upstream
* pristine-tar


# CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_KILL CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID
Build with:
# SystemCallFilter=~acct modify_ldt add_key adjtimex clock_adjtime delete_module fanotify_init finit_module get_mempolicy init_module io_destroy io_getevents iopl ioperm io_setup io_submit io_cancel kcmp kexec_load keyctl lookup_dcookie mbind migrate_pages mount move_pages open_by_handle_at perf_event_open pivot_root process_vm_readv process_vm_writev ptrace remap_file_pages request_key set_mempolicy swapoff swapon umount2 uselib vmsplice
<syntaxhighlight lang="bash">
# MemoryDenyWriteExecute=true
WIKIMEDIA=yes ARCH=amd64 BACKPORTS=yes DIST=stretch GIT_PBUILDER_AUTOCONF=no gbp buildpackage -jauto -us -uc -sa --git-builder=git-pbuilder
</syntaxhighlight>


ReadOnlyDirectories=/usr
The procedure to package new RC versions is roughly as follows. This assumes that: (1) the new RC artifacts are made available under https://people.apache.org/~bcall/8.0.3-rc0/, and (2) you want to build the new packages on boron.eqiad.wmnet.
ReadOnlyDirectories=/var/lib
 
#
<syntaxhighlight lang="bash">
#ReadOnlyDirectories=/etc
https_proxy=http://url-downloader.wikimedia.org:8080 wget https://people.apache.org/~bcall/8.0.3-rc0/trafficserver-8.0.3-rc0.tar.bz2
#ReadWriteDirectories=/etc/trafficserver/internal
# Check that the sha512 matches https://people.apache.org/~bcall/8.0.3-rc0/trafficserver-8.0.3-rc0.tar.bz2.sha512
#ReadWriteDirectories=/etc/trafficserver/snapshots
</syntaxhighlight>
</source>
 
Then obtain our latest prod packages and update them:
 
<syntaxhighlight lang="bash">
apt-get source trafficserver
cd trafficserver-8.0.2/
uupdate -v 8.0.3~rc0 ../trafficserver-8.0.3-rc0.tar.bz2
cd ../trafficserver-8.0.3~rc0
BACKPORTS=yes WIKIMEDIA=yes ARCH=amd64 DIST=stretch GIT_PBUILDER_AUTOCONF=no git-pbuilder
</syntaxhighlight>
 
=== Running autests ===
There are a number of "gold tests" shipped with ATS. They're under tests/gold_tests and can be run as follows:
 
<syntaxhighlight lang="bash">
./tests/autest.sh --ats-bin /usr/bin/ --filter redirect
</syntaxhighlight>
 
==Cheatsheet==
Rolling restart in codfw:
 
<syntaxhighlight lang="bash">
sudo cumin -b1 'A:cp-ats-codfw' 'ats-backend-restart ; sleep 30'
</syntaxhighlight>


== Cheatsheet ==
Show non-default configuration values:
Show non-default configuration values:


<source lang="bash">
<syntaxhighlight lang="bash">
sudo traffic_ctl config diff
sudo traffic_ctl config diff
</source>
</syntaxhighlight>


Configuration reload:
Configuration reload:
<source lang="bash">
<syntaxhighlight lang="bash">
sudo traffic_ctl config reload
sudo traffic_ctl config reload
</source>
</syntaxhighlight>


Check if a reload/restart is needed:
Check if a reload/restart is needed:
<source lang="bash">
<syntaxhighlight lang="bash">
sudo traffic_ctl config status
sudo traffic_ctl config status
</source>
</syntaxhighlight>


Start in debugging mode, dumping headers
Start in debugging mode, dumping headers
<source lang="bash">
<syntaxhighlight lang="bash">
sudo traffic_server -T http_hdrs
sudo traffic_server -T http_hdrs
</source>
</syntaxhighlight>


Access metrics from the CLI:
Access metrics from the CLI:
<source lang="bash">
<syntaxhighlight lang="bash">
traffic_ctl metric get proxy.process.http.cache_hit_fresh
traffic_ctl metric get proxy.process.http.cache_hit_fresh
</source>
</syntaxhighlight>
 
Multiple metrics can be accessed with 'match':
<syntaxhighlight lang="bash">
traffic_ctl metric match proxy.process.ssl.*
</syntaxhighlight>
 
Get metrics relevant to the TLS instance:
<syntaxhighlight lang="bash">
sudo traffic_ctl --run-root=/srv/trafficserver/tls metric match '.*http2.*'
</syntaxhighlight>
 
Set the value of a metric to zero:
<syntaxhighlight lang="bash">
traffic_ctl metric zero proxy.process.http.completed_requests
</syntaxhighlight>
 
Show storage usage:
<syntaxhighlight lang="bash">
traffic_server -C check
</syntaxhighlight>
 
Wipe storage. This needs to be done while trafficserver isn't running.
<syntaxhighlight lang="bash">
traffic_server -C clear_cache
</syntaxhighlight>


== Lua scripting ==
==Lua scripting==
ATS plugins can be written in Lua. As an example, this is how to choose an origin server dynamically:
ATS plugins can be written in Lua. As an example, this is how to choose an origin server dynamically:


<source lang="bash">
<syntaxhighlight lang="bash">
# /etc/trafficserver/remap.config
# /etc/trafficserver/remap.config
map http://127.0.0.1:3128/ http://$origin_server_ip/ @plugin=/usr/lib/trafficserver/modules/tslua.so @pparam=/var/tmp/ats-set-backend.lua
map http://127.0.0.1:3128/ http://$origin_server_ip/ @plugin=/usr/lib/trafficserver/modules/tslua.so @pparam=/var/tmp/ats-set-backend.lua
reverse_map http://$origin_server_ip/ http://127.0.0.1:3128/
reverse_map http://$origin_server_ip/ http://127.0.0.1:3128/
</source>
</syntaxhighlight>
 
===Choosing origin server===
Selecting the appropriate origin server for a given request can be done using ATS [https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html mapping rules]. The same goal can be achieved in lua:


<source lang="lua">
<syntaxhighlight lang="lua">
-- /var/tmp/ats-set-backend.lua
-- /var/tmp/ats-set-backend.lua
function do_remap()
function do_remap()
Line 146: Line 432:
     end
     end
end
end
</source>
</syntaxhighlight>
 
===Negative response caching===
By default ATS caches negative responses such as 404, 503 [https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#admin-negative-response-caching and others] only if the response defines a maxage via the Cache-Control header. This behavior can be changed by setting
the configuration option [https://docs.trafficserver.apache.org/en/7.1.x/admin-guide/files/records.config.en.html#proxy-config-http-negative-caching-enabled proxy.config.http.negative_caching_enabled], which allows caching of negative responses that do NOT specify Cache-Control. If negative caching is enabled, the lifetime of negative responses without Cache-Control is defined by [https://docs.trafficserver.apache.org/en/7.1.x/admin-guide/files/records.config.en.html#proxy-config-http-negative-caching-lifetime proxy.config.http.negative_caching_lifetime], in seconds, defaulting to 1800.
 
One might however desire to cache 404 responses which do not send Cache-Control, without caching any 503 response. Given that proxy.config.http.negative_caching_enabled enables the behavior for a bunch of negative responses, and that ATS versions below 8.0.0 did not allow to specify the list of negative response status codes to cache, the goal can be achieved by setting Cache-Control in lua only for certain status codes:
 
<syntaxhighlight lang="lua">
function read_response()
    local status_code = ts.server_response.get_status()
    local cache_control = ts.server_response.header['Cache-Control']
 
    -- Cache 404 responses without CC for 10s
    if status_code == 404 and not(cache_control) then
        ts.server_response.header['Cache-Control'] = 'max-age=10'
    end
end
 
function do_remap()
    ts.hook(TS_LUA_HOOK_READ_RESPONSE_HDR, read_response)
    return 0
end
</syntaxhighlight>
 
Starting with ATS 8.0.0, the configuration option [https://docs.trafficserver.apache.org/en/8.0.x/admin-guide/files/records.config.en.html#proxy-config-http-negative-caching-list proxy-config-http-negative-caching-list] allows to specify the list of negative response status codes to cache.


===Setting X-Cache-Int===
As another example, the following script takes care of setting the X-Cache-Int response header:
As another example, the following script takes care of setting the X-Cache-Int response header:


<source lang="lua">
<syntaxhighlight lang="lua">
-- /var/tmp/ats-set-x-cache-int.lua
-- /var/tmp/ats-set-x-cache-int.lua
function cache_lookup()
function cache_lookup()
Line 199: Line 511:
     return 0
     return 0
end
end
</source>
</syntaxhighlight>
 
===Custom metrics===
Ad-hoc metrics can be created, incremented and accessed in Lua. For example, to keep per-origin counters of origin server requests:
<syntaxhighlight lang="lua">
function do_global_send_request()
    local ip = ts.server_request.server_addr.get_ip()
 
    if ip == "0.0.0.0" then
        -- internal stuff, not an actual origin server request
        return 0
    end
 
    local counter_name = "origin_requests_" .. ip
 
    counter = ts.stat_find(counter_name)
 
    if counter == nil then
    counter = ts.stat_create(counter_name,
                                TS_LUA_RECORDDATATYPE_INT,
                                TS_LUA_STAT_PERSISTENT,
                                TS_LUA_STAT_SYNC_COUNT)
    end
 
    counter:increment(1)
end
</syntaxhighlight>
 
===Forcing a cache miss (similar to ban)===
{{see also|Varnish#One-off_purges_(bans)}}
Sometimes it is desirable to ensure that certain cached responses are not returned to clients, and that instead the objects are fetched again from the origin server.
 
This can be done in Lua by overriding the '''[https://docs.trafficserver.apache.org/en/8.0.x/admin-guide/files/records.config.en.html#proxy-config-http-cache-generation proxy.config.http.cache.generation]''' setting for a (set of) specific transaction(s). The value passed will be combined with the cache key at cache lookup time, effectively turning one single cache lookup for a certain object into a miss. The object will be fetched again from the origin, and all subsequent cache lookups will hit on the new object.
<!--
The procedure is a bit different depending on whether we need to ban based on attributes of the request or the response.
 
Let's begin by showing how to ban objects based on request attributes. For example, to ban Italian Wikipedia:
-->
<syntaxhighlight lang="lua">
function do_global_read_request()
    if ts.client_request.header['Host'] == 'it.wikipedia.org' then
        ts.http.config_int_set(TS_LUA_CONFIG_HTTP_CACHE_GENERATION, 1593784707)
    end
end
</syntaxhighlight>
 
The example uses the number of seconds since epoch but any integer other than '''-1''' would do. Later on, once it is certain that the old objects have expired, the change can be reverted. Wait at least for the maximum TTL, 24 hours at the time of this writing, before reverting.
 
<!--
In order to ban objects based on response information, for instance all objects with '''Content-Type: application/x-www-form-urlencoded''':


=== Unit testing ===
<syntaxhighlight lang="lua">
function do_global_cache_lookup_complete()
    local cache_status = ts.http.get_cache_lookup_status()
    if cache_status == TS_LUA_CACHE_LOOKUP_HIT_FRESH and ts.cached_response.header['Content-Type'] == 'application/x-www-form-urlencoded' then
        ts.http.config_int_set(TS_LUA_CONFIG_HTTP_CACHE_GENERATION, 1593784707)
        --ts.http.redo_cache_lookup()
    end
end
</syntaxhighlight>
-->
 
===Debugging===
Debugging output can be produced from Lua with '''ts.debug("message")'''. The following configuration needs to be enabled to log debug output:
 
<syntaxhighlight lang="bash">
CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING ts_lua
</syntaxhighlight>
 
In case other debugging tags need to be enabled, such as for example http_hdrs:
<syntaxhighlight lang="bash">
CONFIG proxy.config.diags.debug.tags STRING ts_lua|http_hdrs
</syntaxhighlight>
 
See [https://docs.trafficserver.apache.org/en/8.0.x/admin-guide/files/records.config.en.html#proxy-config-diags-debug-tags the documentation] for more tags.
 
===Unit testing===
The '''busted''' framework allows to test Lua scripts. It can be installed as follows:
The '''busted''' framework allows to test Lua scripts. It can be installed as follows:


<source lang="bash">
<syntaxhighlight lang="bash">
apt install luarocks
apt install luarocks
luarocks install busted
luarocks install busted
luarocks install luacov
luarocks install luacov
</source>
</syntaxhighlight>


The following unit tests cover some of the functionalities implemented by '''ats-set-x-cache-int.lua''':
The following unit tests cover some of the functionalities implemented by '''ats-set-x-cache-int.lua''':
<source lang="lua">
<syntaxhighlight lang="lua">
-- unit_test.lua
-- unit_test.lua
_G.ts = { client_response = {  header = {} }, ctx = {} }
_G.ts = { client_response = {  header = {} }, ctx = {} }
Line 238: Line 626:
   end)
   end)
end)
end)
</source>
</syntaxhighlight>


Run the tests and generate a coverage report with:
Run the tests and generate a coverage report with:


<source lang="bash">
<syntaxhighlight lang="bash">
$ busted -c unit_test.lua  
$ busted -c unit_test.lua  
●●
●●
Line 248: Line 636:


$ luacov ; cat luacov.report.out
$ luacov ; cat luacov.report.out
</source>
</syntaxhighlight>
 
==Storage==
Information about permanent storage can be obtained by using the '''python3-superior-cache-analyzer''' Debian package:
 
<syntaxhighlight lang="python">
from scan import span
s = span.Span("/dev/nvme0n1p1")
print(s)
</syntaxhighlight>
 
==External links==


== External links ==
*https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html
* https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html
*https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html
* https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html
*https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/ts_lua.en.html
* https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/ts_lua.en.html


[[Category:Caching]]
[[Category:Caching]]
[[Category:SRE Traffic]]

Revision as of 08:54, 8 October 2021

Apache Traffic Server, aka ATS, is a caching HTTP proxy used as the backend (on-disk) component of Wikimedia's CDN. In-memory, ephemeral caching is done by cache frontends running Varnish.

Architecture

There are three distinct processes in Traffic Server:

  1. traffic_server
  2. traffic_manager
  3. traffic_cop

traffic_server is the process responsible for dealing with user traffic: accepting connections, processing requests, serving documents from cache or the origin server. traffic_server is a event-driven multi-threaded process. Threads are used to take advantage of multiple CPUs, not to handle multiple connections concurrently (eg: by spawning a thread per connection, or by using a thread pool). Instead, an event system is used in order to schedule work on threads. ATS uses a state machine (compare with the Varnish FSM) to handle each transaction (single HTTP request from a client and the response Traffic Server sends to that client) and provides a system of hooks where plugins (eg: lua) can step in and do things. Specific timers are used at the various states.

traffic_manager is responsible for launching, monitoring and configuring traffic_server, handling the statistics interface, cluster administration and virtual IP failover.

traffic_cop is a watchdog program monitoring the health of both traffic_manager and traffic_server. This has traditionally been the command to use in order to start ATS. In a systemd world, it can probably be avoided, and traffic_manager can be used as the program to be executed in order to start the unit.

Terminology

ATS uses the term transaction with a different meaning depending on the protocol. For HTTP, a transaction is a HTTP request. In the case of HTTP2, a transaction is a HTTP2 stream. When the term connection is used, that means TCP connections regardless of the context. See the relevant ATS documentation about Client Sessions and Transactions.

Configuration

The changes to the default configuration required to get a caching proxy are:

# /etc/trafficserver/remap.config
map client_url origin_server_url

The following rules map grafana and phabricator to their respective backends and define a catchall for requests that don't match either of the first two rules:

# /etc/trafficserver/remap.config
map http://grafana.wikimedia.org/ http://krypton.eqiad.wmnet/
map http://phabricator.wikimedia.org/ http://iridium.eqiad.wmnet/
map / http://deployment-mediawiki05.deployment-prep.eqiad1.wikimedia.cloud/
# /etc/trafficserver/records.config

CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6

CONFIG proxy.config.admin.user_id STRING trafficserver
CONFIG proxy.config.http.cache.required_headers INT 1
CONFIG proxy.config.url_remap.pristine_host_hdr INT 1
CONFIG proxy.config.disable_configuration_modification INT 1

If proxy.config.http.cache.required_headers is set to 2, which is the default, the origin server is required to set an explicit lifetime, from either Expires or Cache-Control: max-age. By setting required_headers to 1, objects with Last-Modified are considered for caching too. Setting the value to 0 means that no headers are required to make documents cachable.

TLS

basic TLS termination can be configured with the following configuration:

# /etc/trafficserver/records.config
CONFIG proxy.config.http.server_ports STRING 3128 3128:ipv6 3129:ssl 3129:ipv6:ssl
CONFIG proxy.config.ssl.server.cert.path STRING /etc/acmecerts/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/acmecerts/
# /etc/trafficserver/ssl_multicert.config
dest_ip=* ssl_cert_name=rsa.crt,ecdsa.crt ssl_key_name=rsa.key,ecdsa.key

Load balancing

In order to load balance requests among origin servers, parent_proxy_routing needs to be enabled in records.config:

# records.config
CONFIG proxy.config.http.parent_proxy_routing_enable INT 1
CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING parent_select

A remap rule needs to be configured for the site:

# remap.config
map http://en.wikipedia.org https://enwiki.org

Finally load balancing can be configured by specifying the nodes and the load balancing policy in parent.config:

# parent.config
dest_domain=enwiki.org parent="mw1261.eqiad.wmnet:443,mw1262.eqiad.wmnet:443" parent_is_proxy=false round_robin=strict

Logging

Diagnostic output can be sent to standard output and error instead of the default logfiles, which is a good idea in order to take advantage of systemd's journal.

# /etc/trafficserver/records.config
CONFIG proxy.config.diags.output.status STRING O
CONFIG proxy.config.diags.output.note STRING O
CONFIG proxy.config.diags.output.warning STRING O
CONFIG proxy.config.diags.output.error STRING E
CONFIG proxy.config.diags.output.fatal STRING E
CONFIG proxy.config.diags.output.alert STRING E
CONFIG proxy.config.diags.output.emergency STRING E

Health checks

Load the `healthchecks` plugin:

# /etc/trafficserver/plugin.config
healthchecks.so /etc/trafficserver/healtchecks.conf

Define health check:

# /etc/trafficserver/healtchecks.conf
/check /etc/trafficserver/ts-alive text/plain 200 403

Response body:

# /etc/trafficserver/ts-alive
All good

With the above configuration, GET requests to `/check` will result in 200 responses from ATS with the response body defined in `/etc/trafficserver/ts-alive`.

Cache inspector

To enable the cache inspector functionality, add the following remap rules:

map /cache-internal/ http://{cache-internal}
map /cache/ http://{cache}
map /stat/ http://{stat}
map /test/ http://{test}
map /hostdb/ http://{hostdb}
map /net/ http://{net}
map /http/ http://{http}

systemd unit

# /etc/systemd/system/trafficserver.service.d/puppet-override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/traffic_manager --nosyslog
Restart=always
RestartSec=1
ExecReload=
# XXX: `traffic_server -C verify_config` is broken: it causes configuration
# reloads, which cause errors with ascii_pipe logs
#ExecReload=/usr/bin/traffic_server -C verify_config
ExecReload=/usr/bin/traffic_ctl config reload
# traffic_manager is terminated with SIGTERM and exits with the received signal
# number (15)
SuccessExitStatus=15

LimitNOFILE=500000
LimitMEMLOCK=90000

# Security options
ProtectKernelModules=yes
ProtectKernelTunables=yes
PrivateTmp=yes

RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK

CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_SETGID CAP_SETUID 
SystemCallFilter=~@keyring @clock @cpu-emulation @obsolete @module @raw-io @debug

# The entire file system hierarchy is mounted read-only, except for the API
# file system subtrees /dev, /proc and /sys
ProtectSystem=strict

# Whitelist read/write directories
ReadWritePaths=/var/log/trafficserver
ReadWritePaths=/var/run/trafficserver
ReadWritePaths=/var/cache/trafficserver

Additional ATS instances

Traffic server provides a poorly documented feature called layouts. The ATS layout defines the following paths:

  • exec_prefix (TS_BUILD_EXEC_PREFIX)
  • bindir (TS_BUILD_BINDIR)
  • sbindir (TS_BUILD_SBINDIR)
  • sysconfdir (TS_BUILD_SYSCONFDIR)
  • datadir (TS_BUILD_DATADIR)
  • includedir (TS_BUILD_INCLUDEDIR)
  • libdir (TS_BUILD_LIBDIR)
  • libexecdir (TS_BUILD_LIBEXECDIR)
  • localstatedir (TS_BUILD_LOCALSTATEDIR)
  • runtimedir (TS_BUILD_RUNTIMEDIR)
  • logdir (TS_BUILD_LOGDIR)
  • mandir (TS_BUILD_MANDIR)
  • infodir (TS_BUILD_INFODIR
  • cachedir (TS_BUILD_CACHEDIR)

Those paths are defined at building time by their corresponding TS_BUILD_ constants. However those can be replaced in runtime by using a layout/runroot file. A layout file is a YAML file that defines the paths listed above and it has the following syntax:

prefix: ./runroot
exec_prefix: ./runroot
bindir: ./runroot/custom_bin
sbindir: ./runroot/custom_sbin
sysconfdir: ./runroot/custom_sysconf
datadir: ./runroot/custom_data
includedir: ./runroot/custom_include
libdir: ./runroot/custom_lib
libexecdir: ./runroot/custom_libexec
localstatedir: ./runroot/custom_localstate
runtimedir: ./runroot/custom_runtime
logdir: ./runroot/custom_log
cachedir: ./runroot/custom_cache

After defining the layout file, the runroot can be initialized by running traffic_layout:

$ traffic_layout --init --layout="custom.yml" --copy-style=soft

Take into account that the custom layout defines its own bin and sbin directories, so it needs to copy the binaries inside the runroot. Fortunately the flag --copy-style allows to control how the executables are being copied:

  • copy: Full copy
  • hard: Use hard links
  • soft: Use symlinks

Our goal here is to run several instances of the same ATS version, so --copy-style=soft allows to do that and still benefit from system-wide ATS upgrades.

After the layout has been initialized any traffic server CLI tool can use it by adding the option --run-root or setting the TS_RUNROOT environment variable:

$ traffic_ctl --run-root="custom.yaml" reload
$ TS_RUNROOT="custom.yaml" traffic_ctl reload

Debugging

The XDebug plugin allows clients to check various aspects of ATS operation.

To enable the plugin, add xdebug.so to plugin.config, add the following lines to records.config, and restart trafficserver.

CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING xdebugs.tag

Once the plugin is enabled, clients can specify various values in the X-Debug header and receive the relevant information back.

For example:

# cache hit
$ curl -H "X-Debug: X-Milestones" http://localhost 2>&1 | grep Milestones:
< X-Milestones: PLUGIN-TOTAL=0.000022445, PLUGIN-ACTIVE=0.000022445, CACHE-OPEN-READ-END=0.000078570, CACHE-OPEN-READ-BEGIN=0.000078570, UA-BEGIN-WRITE=0.000199094, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000

# cache miss
< X-Milestones: PLUGIN-TOTAL=0.000017432, PLUGIN-ACTIVE=0.000017432, DNS-LOOKUP-END=0.091413811, DNS-LOOKUP-BEGIN=0.000148548, CACHE-OPEN-WRITE-END=0.091413811, CACHE-OPEN-WRITE-BEGIN=0.091413811, CACHE-OPEN-READ-END=0.000056997, CACHE-OPEN-READ-BEGIN=0.000056997, SERVER-READ-HEADER-DONE=0.218755336, SERVER-FIRST-READ=0.218755336, SERVER-BEGIN-WRITE=0.091413811, SERVER-CONNECT-END=0.091413811, SERVER-CONNECT=0.091413811, SERVER-FIRST-CONNECT=0.091413811, UA-BEGIN-WRITE=0.218755336, UA-READ-HEADER-DONE=0.000000000, UA-FIRST-READ=0.000000000, UA-BEGIN=0.000000000

The full list of debugging headers is available in the XDebug Plugin documentation.

In the setup at WMF, the plugin can be enabled by setting profile::trafficserver::backend::enable_xdebug to true in hiera. It can then be used by specifying the X-Debug-ATS request header. For example, to dump all client/intermediary/origin request/response headers:

$ curl -H "X-ATS-Debug: log-headers" http://localhost

Request logs

Non-purge request logs can be inspected by running atslog-backend, a wrapper around fifo-log-tailer:

$ sudo atslog-backend

Call fifo-log-tailer directly to inspect PURGE traffic:

# LOG_SOCKET=/var/run/trafficserver/notpurge.sock fifo-log-tailer

Testing patches in labs

The traffic project in labs provides a testbed for ATS related patches, both in terms of Lua/configuration and ATS itself. One of the traffic instances with hostname beginning in traffic-cache- can be used for this purpose. The traffic labs project also features a self-hosted puppetmaster on which puppet patches can be cherry-picked. At the time of this writing the instance is traffic-puppetmaster-buster.traffic.eqiad1.wikimedia.cloud, but double-check that this is still the case now that you are reading.

ema@traffic-cache-atstext-buster:~$ grep ^server /etc/puppet/puppet.conf
server = traffic-puppetmaster-buster.traffic.eqiad1.wikimedia.cloud

The labs testbed provides a different, significantly simplified remap configuration compared to production. Right now it looks like this on cache_text instances:

ema@traffic-cache-atstext-buster:~$ sudo cat /etc/trafficserver/remap.config 
# https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html
# This file is managed by Puppet.

map http://en.wikipedia.beta.wmflabs.org http://deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud

Test requests against MediaWiki can be performed as follows

ema@traffic-cache-atstext-buster:~$ curl -v -H 'Host: en.wikipedia.beta.wmflabs.org' '127.0.0.1:3128/w/load.php?lang=it&modules=startup&only=scripts&raw=1&skin=vector'

Use Host: upload.wikimedia.beta.wmflabs.org on cache_upload instances instead.

Building and running from Git

To build trafficserver from git:

autoreconf -if
./configure --enable-layout=Debian --sysconfdir=/etc/trafficserver --libdir=/usr/lib/trafficserver --libexecdir=/usr/lib/trafficserver/modules
make -j8

Add a minimal /etc/trafficserver/records.config:

CONFIG proxy.config.disable_configuration_modification INT 1
# Replace $PATH_TO_REPO!
CONFIG proxy.config.bin_path STRING ${PATH_TO_REPO}/trafficserver/src/traffic_server/

The newly built traffic_server and traffic_manager binaries can be tested as follows:

sudo -u trafficserver ./src/traffic_server/traffic_server
sudo -u trafficserver ./src/traffic_manager/traffic_manager --nosyslog

Packaging

To package a new stable release, download it from https://trafficserver.apache.org/downloads and check its SHA.

Then import it into operations/debs/trafficserver with:

PRISTINE_ALL_XDELTA=xdelta gbp import-orig --pristine-tar /tmp/trafficserver-8.0.2.tar.bz2

This will upgrade the following branches, don't forget to push all of them to repository:

  • master
  • upstream
  • pristine-tar

Build with:

WIKIMEDIA=yes ARCH=amd64 BACKPORTS=yes DIST=stretch GIT_PBUILDER_AUTOCONF=no gbp buildpackage -jauto -us -uc -sa --git-builder=git-pbuilder

The procedure to package new RC versions is roughly as follows. This assumes that: (1) the new RC artifacts are made available under https://people.apache.org/~bcall/8.0.3-rc0/, and (2) you want to build the new packages on boron.eqiad.wmnet.

https_proxy=http://url-downloader.wikimedia.org:8080 wget https://people.apache.org/~bcall/8.0.3-rc0/trafficserver-8.0.3-rc0.tar.bz2
# Check that the sha512 matches https://people.apache.org/~bcall/8.0.3-rc0/trafficserver-8.0.3-rc0.tar.bz2.sha512

Then obtain our latest prod packages and update them:

apt-get source trafficserver
cd trafficserver-8.0.2/
uupdate -v 8.0.3~rc0 ../trafficserver-8.0.3-rc0.tar.bz2
cd ../trafficserver-8.0.3~rc0
BACKPORTS=yes WIKIMEDIA=yes ARCH=amd64 DIST=stretch GIT_PBUILDER_AUTOCONF=no git-pbuilder

Running autests

There are a number of "gold tests" shipped with ATS. They're under tests/gold_tests and can be run as follows:

./tests/autest.sh --ats-bin /usr/bin/ --filter redirect

Cheatsheet

Rolling restart in codfw:

sudo cumin -b1 'A:cp-ats-codfw' 'ats-backend-restart ; sleep 30'

Show non-default configuration values:

sudo traffic_ctl config diff

Configuration reload:

sudo traffic_ctl config reload

Check if a reload/restart is needed:

sudo traffic_ctl config status

Start in debugging mode, dumping headers

sudo traffic_server -T http_hdrs

Access metrics from the CLI:

traffic_ctl metric get proxy.process.http.cache_hit_fresh

Multiple metrics can be accessed with 'match':

traffic_ctl metric match proxy.process.ssl.*

Get metrics relevant to the TLS instance:

sudo traffic_ctl --run-root=/srv/trafficserver/tls metric match '.*http2.*'

Set the value of a metric to zero:

traffic_ctl metric zero proxy.process.http.completed_requests

Show storage usage:

traffic_server -C check

Wipe storage. This needs to be done while trafficserver isn't running.

traffic_server -C clear_cache

Lua scripting

ATS plugins can be written in Lua. As an example, this is how to choose an origin server dynamically:

# /etc/trafficserver/remap.config
map http://127.0.0.1:3128/ http://$origin_server_ip/ @plugin=/usr/lib/trafficserver/modules/tslua.so @pparam=/var/tmp/ats-set-backend.lua
reverse_map http://$origin_server_ip/ http://127.0.0.1:3128/

Choosing origin server

Selecting the appropriate origin server for a given request can be done using ATS mapping rules. The same goal can be achieved in lua:

-- /var/tmp/ats-set-backend.lua
function do_remap()
    url = ts.client_request.get_url()
    if url:match("/api/rest_v1/") then
        ts.client_request.set_url_host('origin-server.eqiad.wmnet')
        ts.client_request.set_url_port(80)
        ts.client_request.set_url_scheme('http')
        return TS_LUA_REMAP_DID_REMAP
    end
end

Negative response caching

By default ATS caches negative responses such as 404, 503 and others only if the response defines a maxage via the Cache-Control header. This behavior can be changed by setting the configuration option proxy.config.http.negative_caching_enabled, which allows caching of negative responses that do NOT specify Cache-Control. If negative caching is enabled, the lifetime of negative responses without Cache-Control is defined by proxy.config.http.negative_caching_lifetime, in seconds, defaulting to 1800.

One might however desire to cache 404 responses which do not send Cache-Control, without caching any 503 response. Given that proxy.config.http.negative_caching_enabled enables the behavior for a bunch of negative responses, and that ATS versions below 8.0.0 did not allow to specify the list of negative response status codes to cache, the goal can be achieved by setting Cache-Control in lua only for certain status codes:

function read_response()
    local status_code = ts.server_response.get_status()
    local cache_control = ts.server_response.header['Cache-Control']

    -- Cache 404 responses without CC for 10s
    if status_code == 404 and not(cache_control) then
        ts.server_response.header['Cache-Control'] = 'max-age=10'
    end
end

function do_remap()
    ts.hook(TS_LUA_HOOK_READ_RESPONSE_HDR, read_response)
    return 0
end

Starting with ATS 8.0.0, the configuration option proxy-config-http-negative-caching-list allows to specify the list of negative response status codes to cache.

Setting X-Cache-Int

As another example, the following script takes care of setting the X-Cache-Int response header:

-- /var/tmp/ats-set-x-cache-int.lua
function cache_lookup()
     local cache_status = ts.http.get_cache_lookup_status()
     ts.ctx['cstatus'] = cache_status
end

function cache_status_to_string(status)
     if status == TS_LUA_CACHE_LOOKUP_MISS then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_FRESH then
        return "hit"
     end

     if status == TS_LUA_CACHE_LOOKUP_HIT_STALE then
        return "miss"
     end

     if status == TS_LUA_CACHE_LOOKUP_SKIPPED then
        return "pass"
     end

     return "bug"
end

function gen_x_cache_int()
     local hostname = "cp4242" -- from puppet
     local cache_status = cache_status_to_string(ts.ctx['cstatus'])

     local v = ts.client_response.header['X-Cache-Int']
     local mine = hostname .. " " .. cache_status

     if (v) then
        v = v .. ", " .. mine
     else
        v = mine
     end

     ts.client_response.header['X-Cache-Int'] = v
     ts.client_response.header['X-Cache-Status'] = cache_status
end

function do_remap()
     ts.hook(TS_LUA_HOOK_CACHE_LOOKUP_COMPLETE, cache_lookup)
     ts.hook(TS_LUA_HOOK_SEND_RESPONSE_HDR, gen_x_cache_int)
     return 0
end

Custom metrics

Ad-hoc metrics can be created, incremented and accessed in Lua. For example, to keep per-origin counters of origin server requests:

function do_global_send_request()
    local ip = ts.server_request.server_addr.get_ip()

    if ip == "0.0.0.0" then
        -- internal stuff, not an actual origin server request
        return 0
    end

    local counter_name = "origin_requests_" .. ip

    counter = ts.stat_find(counter_name)

    if counter == nil then
    	counter = ts.stat_create(counter_name,
                                 TS_LUA_RECORDDATATYPE_INT,
                                 TS_LUA_STAT_PERSISTENT,
                                 TS_LUA_STAT_SYNC_COUNT)
    end

    counter:increment(1)
end

Forcing a cache miss (similar to ban)

Sometimes it is desirable to ensure that certain cached responses are not returned to clients, and that instead the objects are fetched again from the origin server.

This can be done in Lua by overriding the proxy.config.http.cache.generation setting for a (set of) specific transaction(s). The value passed will be combined with the cache key at cache lookup time, effectively turning one single cache lookup for a certain object into a miss. The object will be fetched again from the origin, and all subsequent cache lookups will hit on the new object.

function do_global_read_request()
    if ts.client_request.header['Host'] == 'it.wikipedia.org' then
        ts.http.config_int_set(TS_LUA_CONFIG_HTTP_CACHE_GENERATION, 1593784707)
    end
end

The example uses the number of seconds since epoch but any integer other than -1 would do. Later on, once it is certain that the old objects have expired, the change can be reverted. Wait at least for the maximum TTL, 24 hours at the time of this writing, before reverting.


Debugging

Debugging output can be produced from Lua with ts.debug("message"). The following configuration needs to be enabled to log debug output:

CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING ts_lua

In case other debugging tags need to be enabled, such as for example http_hdrs:

CONFIG proxy.config.diags.debug.tags STRING ts_lua|http_hdrs

See the documentation for more tags.

Unit testing

The busted framework allows to test Lua scripts. It can be installed as follows:

apt install luarocks
luarocks install busted
luarocks install luacov

The following unit tests cover some of the functionalities implemented by ats-set-x-cache-int.lua:

-- unit_test.lua
_G.ts = { client_response = {  header = {} }, ctx = {} }

describe("Busted unit testing framework", function()
  describe("script for ATS Lua Plugin", function()

    it("test - hook", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = do_remap()
      assert.are.equals(0, result)
    end)

    it("test - gen_x_cache_hit", function()
      stub(ts, "hook")

      require("ats-set-x-cache-int")
      local result = gen_x_cache_int()

      assert.are.equals('miss', ts.client_response.header['X-Cache-Status'])
      assert.are.equals('cp4242 miss', ts.client_response.header['X-Cache-Int'])
    end)

  end)
end)

Run the tests and generate a coverage report with:

$ busted -c unit_test.lua 
●●
2 successes / 0 failures / 0 errors / 0 pending : 0.012771 seconds

$ luacov ; cat luacov.report.out

Storage

Information about permanent storage can be obtained by using the python3-superior-cache-analyzer Debian package:

from scan import span
s = span.Span("/dev/nvme0n1p1")
print(s)

External links