You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Caching overview: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ema
imported>Krinkle
(15 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Navigation Wikimedia infrastructure|expand=caching}}
{{Navigation Wikimedia infrastructure|expand=caching}}
{{See|"CDN" redirects here. For the Toolforge CDN, see [[Help:Toolforge/Web#External assets]].}}
{{note|In a nutshell, the cache hosts run:<br/><code>haproxy for TLS</code> → <code>Varnish cache frontend</code> → <code>ATS cache backend</code>.|type=nutshell}}
This page documents the traffic routing and '''caching infrastructure from a high level'''. It details which layers of the infrastructure exist in edge caching [[w:Point of presence|PoPs]], which exist in core data centers, and how traffic flows through them.


== Cache Clusters ==
{{TOC|limit=2|clear=none}}
== Cache software ==
[[File:Wikipedia_webrequest_flow_2020.png|thumb|300px|Wikipedia request flow]]
{{See also|HAProxy|Varnish|Apache Traffic Server|w:TLS termination proxy|label 4="TLS termination proxy" on Wikipedia}}Since April 2022 ([[phab:T290005|T290005]]) we use [[HAProxy]] for TLS and HTTP2 termination, [[Varnish]] for the in-memory cache ("frontend"), and [[Apache Traffic Server]] is responsible for on-disk persistent caching ("backend").
 
From July 2019 to March 2022 we ran an "ATS sandwhich", with [[Apache Traffic Server|ATS]] as TLS terminator, [[Varnish]] frontend, and a second ATS layer as cache backend.
 
Prior to 2019, we used Nginx- for TLS termination, Varnish frontend, and used a second Varnish layer for the backend cache. In older documented "Varnish" thus might also be referring to the cache backend.
 
== Retention ==
Web browsers first hit the '''LVS load balancers'''.
 
LVS distributes traffic to the '''edge''' '''frontend''' cluster powered by HAProxy and Varnish cache. As of June 2022, the frontend cache is capped to 1&nbsp;day with a 7&nbsp;day keep for benefit of [[w:List_of_HTTP_status_codes#304|HTTP-304 via IMS/INM]] ([[git:operations/puppet/+/25949284b0170a947181ba0252777975c549fd5d/modules/varnish/templates/wikimedia-frontend.vcl.erb#881|wikimedia-frontend.vcl]]).
 
Misses from the frontend are hashed to the '''edge''' '''backend''' cluster powered by ATS cache. Since April 2020, the ATS backend TTL is capped to 24&nbsp;hours ([[phab:T249627|T249627]], [[git:operations/puppet/+/25949284b0170a947181ba0252777975c549fd5d/modules/profile/manifests/trafficserver/backend.pp#111|trafficserver/backend.pp]]).
 
Misses from the backend are distributed to the [[MediaWiki at WMF|MediaWiki]] '''app servers'''.{{Anchor|MediaWiki}} Since July 2016, the Cache-Control max-age for page views is 14&nbsp;days ([[phab:T124954|T124954]], [[Mw:Manual:$wgCdnMaxAge|$wgCdnMaxAge]]). Since May 2021, the wikitext parser cache retains entries for 21&nbsp;days ([[gerrit:c/operations/mediawiki-config/+/685181|change 685181]], [[Mw:Manual:$wgParserCacheExpireTime|wgParserCacheExpireTime]],  [https://github.com/wikimedia/operations-mediawiki-config/blob/bf4ee52d1c1b9a6777168969908d6553bf820508/wmf-config/InitialiseSettings.php#L10714-L10716 wmf-config]).
 
=== Invalidating content ===
For Varnish:
* When pages are edited, their canonical url is proactively purged by MediaWiki (via Kafka and [[Purged]]) .
For ParserCache: 
 
* Values in ParserCache are verifiable by revision ID. Edits will naturally invalidate it.
* The TTL is enforced through a daily maintenance script, schedule via Puppet class <code>misc::maintenance::parsercachepurging</code>.
 
== Routing ==
{{Outdated-inline|note=We no longer run Nginx for frontend TLS. We also no longer expose port 80 directly on cache frontends.}}
 
When [[LVS]] balances traffic to ports :80 (varnish), and :443 (<s>nginx</s>), it uses a hash of the client IP to help with TCP Fast Open and SSL session persistence respectively.
 
Within the caching layer (cp#xxx machines), the jump from <s>nginx</s>:443 to varnish:80 is direct on the local host.
 
However, the jump from varnish:80 (frontend) to varnish:3128 (backend) is different: for that jump, we hash on the URL (and other req meta-data) when balancing to the backends to divide the cache space among all machines, and thus the request typically moves from one machine to another within the same cluster.
 
{{See also|Global traffic routing}}
 
[[File:WMF Inbound Text Traffic Diagram.svg|thumb|Diagram of "text" traffic flow through Wikimedia front edge LVS/nginx/Varnish infrastructure.]]
 
Legend:
 
* [[Eqiad cluster|eqiad]] is representative of whichever data center is currently primary ([[codfw]] is similar).
* [[esams]] is representative of all caching sites ([[ulsfo]] is similar).
* This diagram is for the "text" cache cluster (see [[#Cache clusters]]), but the traffic for  "[[upload.wikimedia.org|upload]]" operates similarly as well.
 
== {{anchor|Cache Clusters}} Cache clusters ==
 
Current cache clusters in all data centers:


We currently host the following Varnish cache clusters at all of our datacenters:
* cache_text - Primary cluster for MediaWiki and various app/service (e.g. RESTBase, phabricator) traffic
* cache_text - Primary cluster for MediaWiki and various app/service (e.g. RESTBase, phabricator) traffic
* cache_upload - Serves [[upload.wikimedia.org]] and [[maps.wikimedia.org]] exclusively (images, thumbnails, map tiles)
* cache_upload - Serves [[upload.wikimedia.org]] and [[maps.wikimedia.org]] exclusively (images, thumbnails, map tiles)


Old clusters that no longer exist:
Former clusters (no longer exist):
* cache_bits - Used to exist just for static content and ResourceLoader, now decommed (traffic went to cache_text)
* cache_bits - Used to exist just for static content and ResourceLoader, now decommed (traffic went to cache_text)
* cache_mobile - Was like cache_text but just for (m|zero)\. mobile hostnames, now decommed (traffic went to cache_text)
* cache_mobile - Was like cache_text but just for (m|zero)\. mobile hostnames, now decommed (traffic went to cache_text)
Line 15: Line 64:


== Headers ==
== Headers ==
:''See also [[MediaWiki HTTP cache headers]]''


=== X-Cache ===
=== X-Cache ===
Line 44: Line 94:


* Path normalization
* Path normalization
* Pass evertyhing which is not GET or HEAD
* Pass everthing which is not GET or HEAD
* Pass X-Wikimedia-Debug and X-Wikimedia-Security-Audit
* Pass X-Wikimedia-Debug and X-Wikimedia-Security-Audit
* Pass Authorization
* Pass Authorization
Line 83: Line 133:
* Cache requests with google analytics cookies and our own global WMF-Last-Access, WMF-Last-Access-Global GeoIP, and CP cookies
* Cache requests with google analytics cookies and our own global WMF-Last-Access, WMF-Last-Access-Global GeoIP, and CP cookies


== Graph ==
== History ==
{{See also|LVS and Varnish}}
An overview of notable events and changes to our caching infrastructure.
<pre>
__________________
| browser/the webz |
|__________________|
    |
    |
    |
  ____________________
|  LVS              |
|    (load balancer) |
|____________________|
    |
    |
    |
    |
    __________________________________________________________
  |  Edge Frontend (Varnish)                                |
  |    Short-lived cache (~10sec, mostly to prevent DDOS)  |
  |    Stored in memory                                    |
  |__________________________________________________________|
        |
        |
        |
      _______________________________
      | Edge Backend (Varnish/ATS)    |
      |  Long-lived cache            |
      |  Stored on disk              |
      |_______________________________|
            |
            |
            |
          _______________________________________
          | Apaches (MediaWiki PHP)              |
          |                                      |
          |  * Cache-Control for page view HTML: |
          |      max-age is 14 days              |
          |  * wikitext parsercache:            |
          |      Expires at 22 days              |
          |      / wgParserCacheExpireTime
          |_______________________________________|
</pre>


== Cache software ==
=== 2022 ===
{{See also|Varnish}}
In April 2022, we replaced ATS with HAProxy for TLS termination and HTTP2 ([[phab:T290005|T290005]]). This changed the stack to: HAProxy for TLS termination, Varnish frontend, and ATS backend.
{{See also|Apache Traffic Server}}
 
We currently (July 2019) use Varnish as the frontend, in-memory software for caching, while Apache Traffic Server is responsible for the the on-disk, persistent cache.
=== 2020 ===
In June 2022, the [[Purged]] service was introduced. MediaWiki no longer uses [[multicast HTCP purging]], but instead produces Kafka events for purging URLs, which local Purged instances on Varnish and ATS servers consume and apply by producing local PURGE requests.
 
In April 2020, a year after switching from Varnish to ATS as cache backend, the TTL was re-enabled and lowered from the 7 days set in 2016, down to 24 hours ([[phab:T249627|T249627]]). With Varnish frontend also at 1 day and a grace-keep of 7 days, this means frontend objects may outlive backend ones.
 
=== 2019 ===
In 2019, we adopted the "ATS sandwhich" featuring Apache Traffic Server (ATS) as both TLS terminator and as backend cache, thus discontinuing Nginx- ("nginx minus") and Varnish backend. This changed the stack to: ATS for TLS termination (<code>ats-tls</code>), Varnish frontend (<code>varnish-fe</code>), and ATS backend (<code>ats-be</code>). It was explored to evolve the ATS-TLS layer to subsume the responsibilities of Varnish-frontend one day.


== MediaWiki ==
Prior to 2019, the stack for many years involved Nginx- for TLS termination and HTTP2, Varnish as frontend, and a second Varnish layer as cache backend. As such, in older documention "Varnish" might sometimes also refer to the cache backend.
* Default max-age setting in Cache-Control headers on page views is 14 days. ([[Mw:Manual:$wgSquidMaxage|$wgSquidMaxage]]; [https://github.com/wikimedia/operations-mediawiki-config/blob/bf4ee52d1c1b9a6777168969908d6553bf820508/wmf-config/InitialiseSettings.php#L11985-L11986 wmf-config])
* Default parsercache expiration is 22 days. ([[Mw:Manual:$wgParserCacheExpireTime|$wgParserCacheExpireTime]]; [https://github.com/wikimedia/operations-mediawiki-config/blob/bf4ee52d1c1b9a6777168969908d6553bf820508/wmf-config/InitialiseSettings.php#L10714-L10716 wmf-config] )


=== Invalidating content ===
=== 2016 ===
For Varnish:
In 2016, we decreased the max object ttl in Varnish from the long-standing 31 days down to 1 day for Varnish frontends, and 14 days for Varnish backends and MediaWiki ([[phab:T124954|T124954]]). The parser cache remains at 31 days.
* When pages are edited, their canonical url is proactively purged in Varnish by MediaWiki.
For ParserCache:
Values in ParserCache are verifiable by revision ID. Edits will naturally update it. 
* puppet: manifests/misc/maintenance.pp
** class misc::maintenance::parsercachepurging
*** Set to 22 days (<code>expire age=2592000</code>)


== Past events ==
In 2016, we deployed HTTP/2 support to the Wikimedia CDN, sing the time comprised of Nginx- and Varnish ([[phab:T96848#1856035|T96848]]).
* 2013: Prevent white-washing of expired page-view HTML.
** Various static aspects of a page are not tracked or versions, as such, when the max-age expires, a If-Not-Modified must not return true after expiry even if the database entry of the wiki page was unchanged.
** More info: [[phab:T46570|https://phabricator.wikimedia.org/T46570]]
* 2016: Decrease max object ttl in Varnish
** More info: [[phab:T124954|https://phabricator.wikimedia.org/T124954]]
** Varnish frontends changed from 31 days to 1 day.
** Varnish backends changed from 31 days to 14 days.
** MediaWiki max-age changed from 31 days to 14 days.


=== 2013 ===
Prevent white-washing of expired page-view HTML. Various static aspects of a page are not tracked or versions, as such, when the max-age expires, a If-Not-Modified must not return true after expiry even if the database entry of the wiki page was unchanged ([[phab:T46570|T46570]]).
[[Category:Caching]]
[[Category:Caching]]

Revision as of 22:23, 9 August 2022

This page documents the traffic routing and caching infrastructure from a high level. It details which layers of the infrastructure exist in edge caching PoPs, which exist in core data centers, and how traffic flows through them.

Cache software

Wikipedia request flow

Since April 2022 (T290005) we use HAProxy for TLS and HTTP2 termination, Varnish for the in-memory cache ("frontend"), and Apache Traffic Server is responsible for on-disk persistent caching ("backend").

From July 2019 to March 2022 we ran an "ATS sandwhich", with ATS as TLS terminator, Varnish frontend, and a second ATS layer as cache backend.

Prior to 2019, we used Nginx- for TLS termination, Varnish frontend, and used a second Varnish layer for the backend cache. In older documented "Varnish" thus might also be referring to the cache backend.

Retention

Web browsers first hit the LVS load balancers.

LVS distributes traffic to the edge frontend cluster powered by HAProxy and Varnish cache. As of June 2022, the frontend cache is capped to 1 day with a 7 day keep for benefit of HTTP-304 via IMS/INM (wikimedia-frontend.vcl).

Misses from the frontend are hashed to the edge backend cluster powered by ATS cache. Since April 2020, the ATS backend TTL is capped to 24 hours (T249627, trafficserver/backend.pp).

Misses from the backend are distributed to the MediaWiki app servers. Since July 2016, the Cache-Control max-age for page views is 14 days (T124954, $wgCdnMaxAge). Since May 2021, the wikitext parser cache retains entries for 21 days (change 685181, wgParserCacheExpireTime, wmf-config).

Invalidating content

For Varnish:

  • When pages are edited, their canonical url is proactively purged by MediaWiki (via Kafka and Purged) .

For ParserCache:

  • Values in ParserCache are verifiable by revision ID. Edits will naturally invalidate it.
  • The TTL is enforced through a daily maintenance script, schedule via Puppet class misc::maintenance::parsercachepurging.

Routing

When LVS balances traffic to ports :80 (varnish), and :443 (nginx), it uses a hash of the client IP to help with TCP Fast Open and SSL session persistence respectively.

Within the caching layer (cp#xxx machines), the jump from nginx:443 to varnish:80 is direct on the local host.

However, the jump from varnish:80 (frontend) to varnish:3128 (backend) is different: for that jump, we hash on the URL (and other req meta-data) when balancing to the backends to divide the cache space among all machines, and thus the request typically moves from one machine to another within the same cluster.

Diagram of "text" traffic flow through Wikimedia front edge LVS/nginx/Varnish infrastructure.

Legend:

  • eqiad is representative of whichever data center is currently primary (codfw is similar).
  • esams is representative of all caching sites (ulsfo is similar).
  • This diagram is for the "text" cache cluster (see #Cache clusters), but the traffic for "upload" operates similarly as well.

Cache clusters

Current cache clusters in all data centers:

  • cache_text - Primary cluster for MediaWiki and various app/service (e.g. RESTBase, phabricator) traffic
  • cache_upload - Serves upload.wikimedia.org and maps.wikimedia.org exclusively (images, thumbnails, map tiles)

Former clusters (no longer exist):

  • cache_bits - Used to exist just for static content and ResourceLoader, now decommed (traffic went to cache_text)
  • cache_mobile - Was like cache_text but just for (m|zero)\. mobile hostnames, now decommed (traffic went to cache_text)
  • cache_parsoid - Legacy entrypoint for parsoid and related *oid services, now decommend (traffic goes via cache_text to RestBase)
  • cache_maps - Served maps.wikimedia.org exclusively, which is now serviced by cache_upload
  • cache_misc - Miscellaneous lower-traffic / support services (e.g. phabricator, metrics, etherpad, graphite, etc). Now moved to cache_text.

Headers

See also MediaWiki HTTP cache headers

X-Cache

X-Cache is a comma-separated list of cache hostnames with information such as hit/miss status for each entry. The header is read right-to-left: the rightmost is the outermost cache, things to the left are progressively deeper towards the applayer. The rightmost cache is the in-memory cache, all others are disk caches.

In case of cache hit, the number of times the object has been returned is also specified. Once "hit" is encountered while reading right to left, everything to the left of "hit" is part of the cached object that got hit. It's whether the entries to the left missed, passed, or hit when that object was first pulled into the hitting cache. For example:

X-Cache: cp1066 hit/6, cp3043 hit/1, cp3040 hit/26603

An explanation of the possible information contained in X-Cache follows.

Not talking to other servers

  • hit: a cache hit in cache storage. There was no need to query a deeper cache server (or the applayer, if already at the last cache server)
  • int: locally-generated response from the cache. For example, a 301 redirect. The cache did not use a cache object and it didn't need to contact another server

Talking to other servers

  • miss: the object might be cacheable, but we don't have it
  • pass: the object was uncacheable, talk to a deeper level

Some subtleties on pass: different caches (eg: in-memory vs. on-disk) might disagree on whether the object is cacheable or not. A pass on the in-memory cache (for example, because the object is too big) could be a hit for an on-disk cache. Also, it's sometimes not clear that an object is uncacheable till the moment we fetch it. In that case, we cache for a short while the fact that the object is uncachable. In Varnish terminology, this is a hit-for-pass.

If we don't know an object is uncacheable until after we fetch it, it's initially identical to a normal miss. Which means coalescing, other requests for the same object will wait for the first response. But after that first fetch we get an uncacheable object, which can't answer the other requests which might have queued. Because of that they all get serialized and we've destroy the performance of hot (high-parallelism) objects that are uncacheable. hit-for-pass is the answer to that problem. When we make that first request (no knowledge), and get an uncacheable response, we create a special cache entry that says something like "this object cannot be cached, remember it for 10 minutes" and then all remaining queries for the next 10 minutes proceed in parallel without coalescing, because it's already known the object isn't cacheable.

The content of the X-Cache header is recorded for every request in the webrequest log table.

Functionalities provided by cache backends

The following functionalities are provided by all cache backends:

  • Path normalization
  • Pass everthing which is not GET or HEAD
  • Pass X-Wikimedia-Debug and X-Wikimedia-Security-Audit
  • Pass Authorization
  • Pass Set-Cookie responses
  • Pass CC:private, no-cache, no-store
  • Pass X-MISS2PASS
  • Performance hack to assign a single Vary slot for HFP to logged in users
  • Provides custom error html if error response has no body
  • Set X-Cache-Int
  • Compress compressible things if the origin didn't already
  • Set various TTL caps
  • Return 403 to client IPs not in wikimedia_trust
  • Unset Accept-Encoding to avoid some corner-cases (see T125938)
  • Unset Public-Key-Pins Public-Key-Pins-Report-Only

Specific to cache_text:

  • Pass the beta variant of the mobile site
  • Pass cxserver.wikimedia.org
  • Request mangling for MediaWiki (keywords: Host, X-Dt-Host, X-Subdomain)
  • Request mangling for RESTBase (/api/rest_v1/ -> /v1/)
  • Request mangling for w.wiki (send to meta.wikimedia.org/wiki/Special:UrlRedirector)
  • Vary slotting for PHP7 (X-Seven)
  • Vary slotting for X-Forwarded-Proto on 301/302
  • Reduce TTL to 60s for mobileaction= / useformat=

Specific to cache_upload:

  • Storage binning to try workaround scalability issues of -sfile
  • Request mangling for X-MediaWiki-Original
  • Disable streaming if Content-Length is missing
  • Pass small objects
  • Pass objects >= ~1GB
  • Pass 200 responses with CL:0 (T144257)

Specific to cache_misc:

  • Pass objects >= ~1GB
  • Disable streaming if Content-Length is missing
  • Cache requests with google analytics cookies and our own global WMF-Last-Access, WMF-Last-Access-Global GeoIP, and CP cookies

History

An overview of notable events and changes to our caching infrastructure.

2022

In April 2022, we replaced ATS with HAProxy for TLS termination and HTTP2 (T290005). This changed the stack to: HAProxy for TLS termination, Varnish frontend, and ATS backend.

2020

In June 2022, the Purged service was introduced. MediaWiki no longer uses multicast HTCP purging, but instead produces Kafka events for purging URLs, which local Purged instances on Varnish and ATS servers consume and apply by producing local PURGE requests.

In April 2020, a year after switching from Varnish to ATS as cache backend, the TTL was re-enabled and lowered from the 7 days set in 2016, down to 24 hours (T249627). With Varnish frontend also at 1 day and a grace-keep of 7 days, this means frontend objects may outlive backend ones.

2019

In 2019, we adopted the "ATS sandwhich" featuring Apache Traffic Server (ATS) as both TLS terminator and as backend cache, thus discontinuing Nginx- ("nginx minus") and Varnish backend. This changed the stack to: ATS for TLS termination (ats-tls), Varnish frontend (varnish-fe), and ATS backend (ats-be). It was explored to evolve the ATS-TLS layer to subsume the responsibilities of Varnish-frontend one day.

Prior to 2019, the stack for many years involved Nginx- for TLS termination and HTTP2, Varnish as frontend, and a second Varnish layer as cache backend. As such, in older documention "Varnish" might sometimes also refer to the cache backend.

2016

In 2016, we decreased the max object ttl in Varnish from the long-standing 31 days down to 1 day for Varnish frontends, and 14 days for Varnish backends and MediaWiki (T124954). The parser cache remains at 31 days.

In 2016, we deployed HTTP/2 support to the Wikimedia CDN, sing the time comprised of Nginx- and Varnish (T96848).

2013

Prevent white-washing of expired page-view HTML. Various static aspects of a page are not tracked or versions, as such, when the max-age expires, a If-Not-Modified must not return true after expiry even if the database entry of the wiki page was unchanged (T46570).