CDN
The Wikimedia Content delivery network ( CDN ) handles traffic routing and HTTP caching for all Wikimedia projects. It is maintained by the SRE Traffic team. This page documents what our CDN exposes for downstream services to predictably consume.
The main components of the CDN, (collectively known as cache-proxy, or cp) servers are:
- HAProxy
- TLS termination , HTTP/2 termination , and rate limiting.
- Varnish
- Front-end (in memory) caching.
- Apache Traffic Server
- Back-end (on disk) caching.
The frontend layer is effectively equally distributed and is responsible for traffic capacity. Each server at this layer is effectively identical and is statistically very likely to hold a copy of the same HTTP responses in-memory.
The backend layer is distributed by request hash (e.g. the URL and other metadata), and it is responsible for content capacity. Each server is assigned a subset of URLs, and thus together is able to hold a diverse and long-tail of HTTP responses.
As a single server cannot handle all end-user traffic for a single peak-popularity page, the frontend layer serves an important role ahead of the backend. The frontend is responsible for absorbing and coalescing concurrent requests for the same URL when it is absent from the cache, so that it only places minimal demand on the (one) backend assigned for that URL.
Headers
HTTPS
TLS protocols
When older standards are dropped, this is done gradually. Clients with deprecated protocols are served https://www.wikipedia.org/sec-warning giving information about why their browser will not be supported in the future.
Ciphers
TLS 1.2 ciphers, in order of preference, are:
- ECDHE-ECDSA-AES256-GCM-SHA384
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256
TLS 1.3 cipher suites, in order of preference, are:
- TLS_AES_128_GCM_SHA256
- TLS_CHACHA20_POLY1305_SHA256
- TLS_AES_256_GCM_SHA384
Rate-limiting
Once an IP reaches a limit of over 2000 concurrent requests, all traffic to that IP is dropped for 300 seconds (five minutes). Connections/sockets are immediately freed to prevent any saturation-based outage. This has a nice side-effect of giving the appearance of their attack succeeding since the attackers will experience endless loading.
Requests that have reached other components behind this portion of the stack will not be canceled.
Request Normalization
Query sorting
Query parameters are alphabetically sorted to improve cache hitrate. Without sorting,
/page?a=1&b=1
and
/page?b=1&a=1
would miss the cache despite technically being the same page. Alphabetical sorting creates predictable URLs.
Example:
/favicon.ico?vgutierrez=1&c=1&b=0&a=0
is sorted as
/favicon.ico?a=0&b=0&c=1&vgutierrez=1
This very same sorting strategy is implemented in purged , the daemon responsible for fetching purge events from the application layer and injecting them in both the front-end and back-end caching layer.
Path normalization
Pages with parentheses or certain other special characters in their titles have more than one correct URL. For example the two following URLs are both correct:
- https://en.wikipedia.org/wiki/Steve_Fuller_%28sociologist%29
- https://en.wikipedia.org/wiki/Steve_Fuller_(sociologist)
One with literal parentheses, one with parentheses URL-encoded, or one with a mix of the two are all valid. However, when a page changes, purges are sent only for the URL-encoded URL: if the encoded URL is cached, it does not get purged.
Caching
Current cache clusters in all data centers:
- cache_text
- Primary cluster for all wiki domains traffic (MediaWiki), and misc web services (e.g. Gerrit , Phabricator )
- cache_upload
- Serves upload.wikimedia.org and maps.wikimedia.org exclusively (images, thumbnails, map tiles)
Any other cache clusters one might find in the wild are likely historical and decommissioned .
Text cluster
The front-end caching layer hides non-session cookies (those that don't match
([sS]ession|Token)=
) for cache lookup purposes. After cache lookup is performed the cookies are restored so they reach upstream as expected.
This assumes that any upstream that requires some non-session cookie to work properly (like the GeoIP one) will return a non cacheable response
.
By default, Varnish doesn't cache requests with cookies. In order to cache responses with cookies, Varnish replaces session cookies with the fixed string
Token=1
if — and
only
if —
Vary:Cookie
isn't present in the response.
Logic
The backend caching layer avoids caching responses that meet any of the following requirements:
-
Response contains a
Set-Cookieheader -
Response contains a
Vary:Cookieheader and an uncacheable cookie -
Content-Lengthis bigger than 1GB -
Response status is higher than
499 -
Request contains an
Authorizationheader
Additionally the backend caching layer will skip cache lookup for any request that meet any of the following requirements:
-
Request contains an
Authorizationheader
Retention
Web browsers first hit the LVS load balancers.
LVS distributes traffic to the edge frontend cluster. As of June 2022, the frontend cache is capped to 1 day with a 7-day keep for benefit of HTTP-304 via IMS/INM ( wikimedia-frontend.vcl ).
Misses from the frontend are hashed to the edge backend cluster. Since April 2020, the ATS backend TTL is capped to 24 hours ( T249627 , trafficserver/backend.pp ).
Misses and HTTP-304 renewals from the ATS backend are routed to the
MediaWiki
app servers. Since July 2016, the
max-age
for page views is
14 days
(
T124954
,
$wgCdnMaxAge
), which controls for how long an unmodified page may have its page view HTML renewed (possibly several times, after another 24 hours), and this shapes the long tail for configuration changes, skin changes, and anything else that isn't tracked by the page edit timestamp or stored inside ParserOutput/ParserCache. Changes were only one reality is meant to be presented, should generally pre-seed their state for 14 days to be fully resiliant against this.
Since Dec 2023, the wikitext parser cache retains entries for 30 days ( T280604 , wgParserCacheExpireTime , wmf-config ).
Invalidating content
For Varnish:
- When pages are edited, their canonical url is proactively purged by MediaWiki (via Kafka and Purged ).
For ParserCache :
- Values in ParserCache are verifiable by revision ID and a "page touched" timestamp. Edits will bump the "page touched" timestamp on a page *and all pages which include it* to invalidate the cached entries. TTL is also enforced on read, with older entries invalidated.
-
Invalid entries are not removed from the ParserCache by MediaWiki, although they can be overwritten. Removing "too old" entries is done by a daily maintenance script, scheduled via Puppet class
misc::maintenance::parsercachepurging.
Optimizations
The backend caching layer strips all cookies (except MediaWiki/CentralAuth sessions) when performing cache lookups. It is thus assumed that all other cookies are either for client-side usage only (and safe to ignore for caching), or are used by low-traffic features that explicitly opt-out from caching. This significantly improves hitrate and reduces cache writes ( change 828002 ).
See also
- CDN/Hardware : An overview of the physical servers powering the CDN.
- SRE/Traffic : A full overview of CDN software components
- CDN/History : The history of earlier iterations of the stack up until now.