You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
MediaWiki at WMF
MediaWiki is the collaborative editing software that runs Wikipedia. This page documents its deployment at Wikimedia Foundation.
- For more about the history, see "MediaWiki" on wikipedia.org.
- For how to use, install or contribute, see mediawiki.org.
A Wikipedia web request is processed in a series of steps outlined here (as of April 2020).
- The DNS resolves hostnames like
en.wikipedia.orgultimately points to an address like
text-lb.*.wikimedia.org, for which the IP addresses are service IPs handled by LVS, which acts as a direct-routing load balancer to our caching proxies.
» See also DNS, Global traffic routing, and LVS.
- Wikimedia Foundation owns its content-delivery network. The public load balancers and caching proxies are located in all data centres (especially those with the sole role of being an edge cache, also known as "pop").
» See also Clusters and PoPs.
- The caching proxies are servers consisting of three layers: TLS termination, frontend caching, backend caching. Each cache proxy server hosts all three of these layers.
» See also Caching overview.
- TLS termination and HTTP/2 handling, handled by Apache Traffic Server (ATS) (internally called
ats-tls). Prior to 2020, we used Nginx- here.
- Frontend caching: This is an in-memory HTTP cache (uses Varnish, called "Varnish frontend", or
varnish-fe). The LVS load balancers route the request to a random cache proxy server to maximise the amount of parallel traffic we can handle. Each frontend cache server likely holds the same set of responses in its cache, the logical capicity for the frontend cache is therefore equal to 1 server's RAM.
- Backend caching: The backend HTTP caches are routed to by frontend caches in case of a cache miss. Contrary to the frontends, these are routed by a consistent hash, and they also persist their cache on disk (instead of in memory). The backend caches scale horizontally and have a logical capacity equal to the total of all servers. In case of a surge in traffic to a particular page, the frontends should each get a copy and distribute from there. Because of consistent hashing, the same backend cache is always consulted for the same URL. We use request coalescing to avoid multiple requests for the same URL hitting the same backend server. For the backend cache, we use a second layer of ATS (
ats-be). Prior to 2020, WMF used a second layer of Varnish (
varnish-be) for backend caching.
- TLS termination and HTTP/2 handling, handled by Apache Traffic Server (ATS) (internally called
- After the cache proxies we arrive at the application servers (that is, if the request was not fulfilled by a cache). The application servers are load-balanced via LVS. Connections between backend caches and app servers are encrypted with TLS, which is terminated locally on the app server using a local Envoy instance, which, in turn, hands the request off to the local Apache. Prior to mid-2020, Nginx- was used for TLS termination. Apache there is in charge of handling redirects, rewrite rules, and determining the document root. It then uses
php-fpmto invoke the MediaWiki software on the app servers. The application servers and all other backend services (such as Memcached and MariaDB) are located in "Core services" data centers, currently Eqiad and Codfw.
» See also Application servers for more about how Apache, PHP7 and php-fpm are configured.
- See Application servers for more about how Apache and php-fpm are configured.
The application servers are divided in the following groups:
|Description||Conftool cluster||Hiera cluster||Purpose|
|Main app servers||appserver||appserver||Public HTTP from ATS for wiki domains (except XWD, |
|Debug servers||testserver||appserver||Public HTTP from ATS for wiki domains with X-Wikimedia-Debug.|
|API app servers||api_appserver||api_appserver||Public HTTP from ATS for wiki domains with |
|Parsoid servers||parsoid||parsoid||Internal HTTP to parsoid-php.discovery.wmnet. Used by RESTBase via |
|Jobrunners||jobrunner||jobrunner||Internal HTTP to jobrunner.discovery.wmnet. Used by ChangeProp-JobQueue via |
|Videoscalers||videoscaler||jobrunner||Internal HTTP to videoscaler.discovery.wmnet. Used by ChangeProp-JobQueue via |
|Maintenance hosts||–||misc||Internal. Used for scheduled and ad-hoc maintenance scripts run from the command-line.|
|Snapshot hosts||–||dumps||Internal. Used for scheduled work from the command-line relating to XML dumps.|
For web requests using Apache, the "Hiera cluster" value is also exposed as
$_SERVER['SERVERGROUP'] to PHP.
In Grafana dashboards, Prometheus metrics, and Icinga alerts the
cluster field usually refers to the "Hiera cluster" value as well.
For web requests not served by the cache, the request eventually arrives on an app server where Apache invokes PHP via
- Example request:
The document root for a wiki domain like "en.wikipedia.org" is
docroot/wikipedia.org directory is mostly empty, except for
w/, which is symlinked to a wiki-agnostic directory that looks like a MediaWiki install (in that it has files like "index.php", "api.php", and "load.php"), but actually contains small stubs that invoke "Multiversion".
Multiversion is a WMF-specific script (maintained in the operations/mediawiki-config repo) that inspects the hostname of the web request (e.g. "en.wikipedia.org"), and finds the appropiate MediaWiki installation for that hostname. The weekly Deployment train creates a fresh branch from the latest master of MediaWiki (including any extensions we deploy), and clones it to the deployment server in a directory named like
For example, if the English Wikipedia is running MediaWiki version 1.30.0-wmf.5, then "en.wikipedia.org/w/index.php" will effectively be mapped to
/srv/mediawiki/php-1.30.0-wmf.5/index.php. For more about the "wikiversions" selector, see Heterogeneous deployment.
The train also creates a stub
LocalSettings.php file in this
php-… directory. This stub
LocalSettings. file does nothing other than include
wmf-config/CommonSettings.php (also in the operations/mediawiki-config repo).
CommonSettings.php file is responsible for configuring MediaWiki, this includes database configuration (which DB server to connect to etc.), loading MW extensions and configuring them, and general site settings (name of the wiki, its logo, etc.).
After CommonSettings.php is done, MediaWiki handles the rest of the request and responds accordingly.
To read more about how MediaWiki works in general, see:
- Manual:Code on mediawiki.org, about entry points and the directory structure of MediaWiki.
- Manual:Index.php on mediawiki.org, for what a typical MediaWiki entrypoint does.
There are broadly speaking four kinds of static assets served by Apache on MediaWiki application servers:
- Varnish: Strip cookies, fixed hostname.
- Apache: Rewrite to
- Caching: public, 1 year, hostname-agnostic (Varnish object is shared across wiki domains).
- Stats: Grafana: MediaWiki Static.
Versioned resources are the most common way we serve static files, and is generally how new code should use assets. These URLs are produced by MediaWiki's ResourceLoader or OutputPage component, and work by mapping the URL to a file on disk, hashing it, and appending that hash as a query string.
On the backend, the requests for versioned resources are rewritten to
/w/static.php. This implements important behaviours:
- If given a version hash, match the request with the right version of the file by checking the two currently active MW branches in production.
- If given a verison hash, disable caching (reduce to 1 minute for clients and CDN) if the requested version is not found. This avoids non-recovering cache poisoning around deployments, which would otherwise be possible given that we do not atomically group end-users and CDN servers and backend servers. More background about this eventual-consistency can be found in the source, and in T47877.
- Without a version hash, serve the current version as found in the latest MW branch, regardless of hostname.
Example use cases where we can't reasonably use a verison hash and thus generically serve the current version:
- Gadgets and user scripts that augment core functionality and re-purpose some of our assets. For example, Wikipedia's Vector.css override references an svg icon from MediaWiki. It isn't versioned as the editor would otherwise have to keep it in sync with our deployments.
- Debug mode from ResourceLoader, where we intentionally serve internal JS and CSS files directly without minification at their "current" version. Cache performance is not a concern in debug mode.
- A tail of random things in core and extensions that reference static files that aren't part of any UI code. Such as Special:Version linking the
- ULS fonts (T135806). Upto 2021, files like this were sometimes served from "/static/current/**" which was deprecated in favour of simply "/w/**" in T302465.
- Caching: public, 1 year, hostname-agnostic (Varnish object is shared across wikis).
These are custom assets, generally pointed to from settings in wmf-config.
The most prominent example are our project logos and favicons. We want to serve these from a stable URL that we can expose through APIs, to external organizations, be saved in databases, ParserCache, CDN, etc.. These URLs present a consistent experience to any given user, regardless of when the page they are on was last edited or purged. Changes to "static" resources should be rare as browsers are allowed to use their copy offline, without revalidation, for up to a year. This means that purging from the CDN does not mean users can be expected to get the latest copy.
/static directory is external to MediaWiki and only used if and when explicitly configured so in
wmf-config. Remember that it does not consider the wiki's multiversion assignment, so it may serve a version that is a week ahead or a week behind from the wiki's MW branch and PHP code.
Generally speaking, the app servers allow upto 60 seconds for most web requests (e.g. page views, HTTP GET), and for write actions we allow upto 200 seconds (e.g. edits, HTTP POST).
- » See HTTP timeouts#App server for a detailed breakdown of the various timeouts on app servers.
|Type||Wall clock time|
|Notes||This was added as a measure to prevent pileups from a single event, as well as to overcome the (considered not ideal behavior) of terminated connections keeping running even if there won't be any socket open to report to. Implemented on MySQL's event scheduler for legacy reasons, but using |
Pages in the MediaWiki production category
- User:Quiddity/How does it all work, related notes and infographics