You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This page is an attempt of documenting the timeouts involved in a request performed by a user against a service behind WMF caching layer.
The entry point for a user could be nginx or ats-tls depending on the service and the cache node assigned to the user IP:
|TLS termination layer||SSL handshake timeout||TTFB (origin server)||successive reads (origin server)||Keepalive timeout (client)|
|nginx||60 seconds (nginx default value)||180 seconds||180 seconds (same config parameter as TTFB)||60 seconds|
|ats-tls||60 seconds||180 seconds||200 seconds||120 seconds|
Currently a big difference between nginx and ats-tls can be found on how they handle POST requests. nginx buffers the whole request completely before relying it to the origin (varnish-frontend) while ats-tls doesn't buffer it and relays the connection to varnish-frontend as soon as possible. On nginx, the timeout to fulfil the POST body is 60 seconds between read operations, this is the default value and it isn't explicitly configured.
Our caching system is split in two layers (frontend and backend). There is one implementation of the frontend layer (varnish) and two implementations of the backend layer (varnish-be and ats-be).
|caching layer||connect timeout||TTFB||successive reads|
|varnish-frontend||3 secondstext / 5 secondsupload||65 secondstext / 35 secondsupload||33 secondstext / 60 secondsupload|
|varnish-backend||3 secondstext||63 secondstext||31 secondstext|
|ats-backend||N/A (fused together with TTFB)||180 secondsGET / 180 secondsPOST,PUT||200 seconds|
After leaving the backend caching layer, the request reaches the appserver. Here are described the timeouts that apply to appservers and api:
|Nginx (TLS/ats-be requests)||N/A (same timeouts as the nginx used for TLS termination)|
|Envoy(TLS/ats-be requests)||1 secondconnect timeout / 65 secondsroute timeout|
|PHP||201 secondsappservers / 201 secondsapi|
|Excimer||60 secondsGET / 200 secondsPOST|
Note: Those timeouts might be larger than the ones on the caching layer, mainly to properly service internal clients