You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
HTTP timeouts
This documents HTTP timeouts involved in a we requests from users to a service behind WMF traffic layers.
TLS
The entry point for a user is ats-tls, which node depends on the service and user IP address:
TLS termination layer | SSL handshake timeout | connect timeout (origin server) | TTFB (origin server) | successive reads (origin server) | Keepalive timeout (client) |
---|---|---|---|---|---|
ats-tls | 60 seconds | 3 seconds | 180 seconds | 180 seconds | 120 seconds |
nginx (deprecated) | 60 seconds (nginx default value) | 10 seconds (nginx default value) | 180 seconds | 180 seconds (same config parameter as TTFB) | 60 seconds |
Currently a big difference between nginx and ats-tls can be found on how they handle POST requests. nginx buffers the whole request completely before relying it to the origin (varnish-frontend) while ats-tls doesn't buffer it and relays the connection to varnish-frontend as soon as possible. On nginx, the timeout to fulfil the POST body is 60 seconds between read operations, this is the default value and it isn't explicitly configured.
Caching
Our caching system is split in two layers (frontend, and backend). There is one implementation of the frontend layer (varnish) and one implementation of the backend layer (ats-be).
caching layer | connect timeout | TTFB | successive reads |
---|---|---|---|
varnish-frontend | 3 seconds (text) / 5 seconds (upload) | 65 seconds (text) / 35 seconds (upload) | 33 seconds (text) / 60 seconds (upload) |
ats-backend | 10 seconds | 180 seconds | 180 seconds |
App server
After leaving the backend caching layer, the request reaches the appserver. Here are described the timeouts that apply to appservers and api:
layer | request timeout | ||
---|---|---|---|
Nginx (TLS) | 180 seconds (appserver, api, parsoid) / 1200 seconds (jobrunner) / 86400 seconds (videoscaler).
Configured by | ||
Envoy (TLS/ats-be requests) | 1 second (connect timeout) / 65 seconds (route timeout)
| ||
Apache | 202 seconds (appserver, api, parsoid) / 1202 seconds (jobrunner) / 86402 seconds (videoscaler).
Configured by | ||
php-fpm | 201 seconds (appservers) / 201 seconds (api) / 201 seconds (parsoid) / 86400 seconds (jobrunner, videoscaler).
Configured by | ||
PHP | 210 seconds (appserver, api, parsoid) / 1200 seconds (jobrunner, videoscaler).
Configured by | ||
MediaWiki | 60 seconds (GET) / 200 seconds (POST) / 200 seconds (jobrunner) / 86400 seconds (videoscaler).
This is configured using php-excimer |
Notes
The app server timeouts might be larger than the ones on the caching layer, this is mainly to properly service internal clients.
- php-fpm
- The
request_timeout
setting the maximum time php-fpm will spend processing a request before terminating the worker process. This exists as a last-resort to kill PHP processes even if a long-running C function is not yielding to Excimer and/or if PHP raisedmax_execution_time
at run-time. - PHP
- The
max_execution_time
setting in php.ini measures CPU time (not wall clock time), and does not include syscalls. - Note that unlike all other settings, for videoscalers this setting is far lower than the higher-level timeouts (20min vs 24h). This is a compromise to prevent regular jobs from being able to spend 24h on the CPU, which would be very unexpected (as they share the same php-fpm configuration). Videoscaling jobs are expected to spend most of their time transcoding videos, which happens through syscalls so this is fine.
- MediaWiki
- This is controlled by the
ExcimerTimer
interval value, in wmf-config/set-time-limit. Upon reaching the timeout, php-excimer will throw aWMFTimeoutException
exception once the current syscall returns.