Global traffic routing
This page covers our mechanisms for routing user requests through our Traffic infrastructure layers. The routing can be modified through administrative actions to improve performance and/or reliability, and/or respond to network outage conditions.
Sites
There are currently
seven total data centers
(or "sites"). All locations can receive direct user traffic, however
eqiad
and
codfw
also host
Core application services
, whereas
ulsfo
,
esams
,
drmrs
,
eqsin
, and
magru
are limited to
Edge caching
.
GeoDNS (User-to-Edge Routing)
The first point of entry is when the client performs a DNS request on one of our public hostnames. Our authoritative DNS servers perform GeoIP resolution and hand out one of several distinct IP addresses, sending users approximately to their nearest site. We can disable sending users directly to a particular site through DNS configuration updates. Our DNS TTLs are commonly 5 minutes long, and some rare user caches will violate specs and cache them longer. The bulk of the traffic should switch inside of 5 minutes, though, with a fairly linear progression over that window.
Edge routing
Internet traffic can enter through the front edge (traffic layer) of any of the data centers. Requests that are not cache hits are then routed on to eventually reach a backend service (application layer) in a core data center.
Ideally, all of our application-layer services operate in an active/active configuration, meaning they can directly and simultaneously accept web traffic in either of the application data centers. Some application services are active/passive, meaning that they're only accepting web traffic in only the primary application data center, but not the secondary at the same time. Active/active services might be temporarily configured to use only a single one of the application data centers for various operational maintenance or outage reasons.
In the active/active application's case, global traffic is effectively split. Users whose traffic enters at
eqsin
,
ulsfo
, or
codfw
would reach the application service in
codfw
, and users whose traffic enters at
drmrs
,
esams
, or
eqiad
would reach the application service in
eqiad
.
When LVS (the entry point of the CDN) balances traffic to port 443 it uses a hash of the client IP to help with TCP Fast Open and TLS session persistence.
Within the caching layer ( cp servers), the jump from HAProxy:443 to Varnish:80 is direct on the local host. However, the jump from Varnish (frontend, port 80) to Apache Traffic Server (backend, port 3128) is different: For that jump, we hash on the URL (and other req meta-data) when balancing to the backends to divide the cache space among all machines, and thus the request typically moves from one machine to another within the same cluster.
Cache-to-application routing
Frontend routing
Upon entering a given data center, HTTP requests reach a cache frontend host running Varnish. At this layer, caching is controlled by either the
cache::req_handling
or
cache::alternate_domains
hiera setting. The former is used by main sites like the wikis and upload.wikimedia.org, while the latter is used by miscellaneous sites such as for example
phabricator.wikimedia.org
and
grafana.wikimedia.org
. Choosing which data structure to use depends on whether the site needs to be controlled by the regular or misc VCL, most likely misc. It is thus almost sure that additional services need to be added to
cache::alternate_domains
. If in doubt, contact the traffic team. The format of both data structures is:
cache::alternate_domains:
hostname1:
caching: 'normal'
hostname2:
caching: 'pass'
In Puppet terms there is a data type for those structures:
Profile::Cache::Sites
. The
caching
attribute is particularly interesting, see its
type definition
.
A value of
normal
in the caching attribute means that Varnish will cache the responses for this site unless
Cache-Control
says otherwise. Conversely,
pass
means that objects for this site are never to be cached. It would be preferable to specify
normal
and ensure that the origin returns
Cache-Control
with appropriate values for responses that should not be cached, but where this is not possible
pass
can be used. For sites that need to support websockets, such as Phabricator/Etherpad, use
websockets
. A sample of the production values for
cache::alternate_domains
as of July 2020 follows.
cache::alternate_domains:
15.wikipedia.org:
caching: 'normal'
analytics.wikimedia.org:
caching: 'normal'
annual.wikimedia.org:
caching: 'normal'
blubberoid.wikimedia.org:
caching: 'pass'
bienvenida.wikimedia.org:
caching: 'normal'
etherpad.wikimedia.org:
caching: 'websockets'
Backend routing
In case there is no cache hit at the frontend layer, requests are sent to a cache backend running
ATS
in the same DC. Backend selection is done by applying consistent hashing on the request URL. If at the backend layer there is also no cache hit, the final step is routing requests out the back edge of the Traffic caching infrastructure into the application layer. The application layer services can exist at one or both of the two application data centers (
eqiad
and
codfw
) on a case-by-case basis. This is controlled by ATS remap rules mapping the
Host
header to a given origin server hostname.
The list of mappings and transformations (via
Lua plugins
) is controlled by Hiera key
profile::trafficserver::backend::mapping_rules
. For production, these are managed in
profile/trafficserver/backend.yaml
.
For most services, the configuration of whether the service is active/active or active/passive is done via DNS/Discovery . The exception to this rule is services available in one primary DC only, such as pivot (eqiad-only) in the example below:
profile::trafficserver::backend::mapping_rules:
- type: map
target: http://15.wikipedia.org
replacement: https://webserver-misc-apps.discovery.wmnet
- type: map
target: http://phabricator.wikimedia.org
replacement: https://phabricator.discovery.wmnet
- type: map
target: http://pivot.wikimedia.org
replacement: https://an-tool1007.eqiad.wmnet
Any administrative action such as depooling an application DC for active/active services, or moving an active/passive service from one application DC to the other, can be performed via DNS discovery updates .
When adding a new service to
profile::trafficserver::backend::mapping_rules
, ensure that the public hostname (ie: the hostname component of
target
) is included in the Subject Alternative Name (SAN) list of the certificate served by
replacement
. This is needed to ensure a successful TLS connection establishment between ATS and the origin server.
The following command provides an example for how to verify that the hostname phabricator.wikimedia.org is included in the SAN of the certificate offered by phabricator.discovery.wmnet :
$ echo | openssl s_client -connect phabricator.discovery.wmnet:443 2>&1 | openssl x509 -noout -text | grep -q DNS:phabricator.wikimedia.org && echo OK || echo KO
OK
If the above command fails, you might have to update the origin server certificate to include the public hostname. See Cergen .
To further verify that HTTPS requests are served properly by the configured origin, and everything works including the TLS handshake:
# get the IP address of phabricator.discovery.wmnet
$ host phabricator.discovery.wmnet
phabricator.discovery.wmnet is an alias for phab1001.eqiad.wmnet.
phab1001.eqiad.wmnet has address 10.64.16.8
# test an HTTPS request
$ curl -I https://phabricator.wikimedia.org --resolve phabricator.wikimedia.org:443:10.64.16.8
HTTP/1.1 200 OK
[...]
Management
Site deployment ordering
Typically, changes are deployed from a smallest-impact to largest-impact sites: More specifically, the order follows as:
- Low-impact Point-of-Presence datacenters (magru, ulsfo)
- High-impact Point-of-Presence datacenters (drmrs, esams, eqsin)
- Core datacenters (codfw, eqiad)
Disabling a Site
To disable a site as an edge destination for user traffic in GeoDNS:
Downtime the matching site alert (if there is one) in https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=traffic+drop
In the
operations/dns
repo, edit the file
admin_state
There are instructions inside for complex changes, but for the basic operation of completely disabling a site, the line you need to add at the bottom for e.g. disabling
esams
is:
geoip/generic-map/esams => DOWN
... and then deploy the DNS change in the usual way: merge through gerrit, ssh to any
one
of our authdns servers (in A:dns-rec), and execute
authdns-update
as root.
Hard enforcement of GeoDNS-disabled sites
In the case that we need to
guarantee
that zero requests are flowing into the user-facing edge of a disabled site for a given cache cluster (or all clusters), we can forcibly block all traffic at the front edge. This should only be done when strictly necessary, and only long after (e.g. 24H after) making the DNS switch above, to avoid impacting those with minor trailing DNS cache update issues. To lock traffic out of the cache frontends for a given cluster in a given site, you'll need to merge and deploy a puppet hieradata update which sets the key
cache::traffic_shutdown
to
true
for the applicable cluster/site combinations.
For example, to lock all traffic out of the text cluster in eqiad, add the following line to
hieradata/role/eqiad/cache/text.yaml
:
cache::traffic_shutdown: true
Once the change is merged and applied to the nodes with puppet, all requests sent to eqiad will get a HTTP 403 response from the cache frontends instead of being served from cache or routed to the appropriate origin server.
External links
- Geomapping of the different datacenters , as seen in the dns configuration