Ncredir
Ncredir is our Non-Canonical Redirect service. It handles traffic from the outside world to a list of domain names which we own but are not primary canonical domain names for our projects. A key example would be wikipedia.com .
It is separate from our primary edge traffic clusters for the canonical domains, running on independent instances and public IPs. Under normal conditions, it gets little traffic.
Types of domains
Generally, ncredir domains fall under a few categories and have expected redirection strategies (per WMF's Legal team):
- Paid editing services: Redirect to WMF's article informing users not to pay for domains .
- Mispellings: Some are seized typosquatting domains, some are courtesy domains from common misspellings. Redirect to the closest official service. For example, dewikipedia.org would redirect to de.wikipedia.org.
- General redirection: Old domains that have new homes, shortener domains, etc.
Some example redirections:
-
Language code redirects:
wikipedia.br→br.wikipedia.org - Scam domains:
# Project typo
funnel *wikiipedia.org https://www.wikipedia.org
# Project typo: "biz" means commercial, so redirect to wikimedia's .com instead of .org
funnel *wikimedia.biz https://wikimedia.com
# TLD variation
funnel *wikimedia.community https://www.wikimedia.org
# Country TLD → Project language code subdomain
rewrite *wikibooks.gr https://el.wikibooks.org
# Country TLD (Greece)→ Language code (el) as query param
rewrite *mediawiki.gr https://www.mediawiki.org/?uselang=el
funnel *wikidata.ro https://www.wikidata.org?uselang=ro
# Scam domain
funnel scam-domain.example.com https://wikimediafoundation.org/news/2018/08/22/dont-pay-for-wikipedia-articles/
If in doubt, refer to prior art entries in the data file!
In short, redirect seized edit-for-pay domains to the WMF blog post and all other domains to their closest-sounding counterparts.
Components
ncredir is implemented using a combination of
acme-chief
managed certificates and nginx. nginx redirects are created in the
Puppet
repository according to the rules laid out in the
nc_redirects.dat file
. nc_redirects.dat is used as input for the the custom
compile_redirects()
function in Puppet which outputs redirects as nginx configuration.
Redirection logic
Nginx is fed with two maps containing the redirection logic. The first map populates a variable called
$override
, and the second one a variable called
$rewrite
.
The first map populating
$override
is generated with the override stanzas contained in the redirects definition file, while the
$rewrite map
is populated with the funnel and rewrite stanzas from the definition file.
This mapping between the nc_redirects.dat file and nginx happens on puppet compilation time. So in the ncredir servers only nginx + the acme-chief managed certs are needed to run the service.
TLS
ncredir does not utilize the CDN clusters; It handles its own TLS termination. ncredir exposes itself to live traffic using the high-traffic1 class in LVS . The service is geographically balanced via GeoDNS with the ncredir-lb.wikimedia.org record which balances the traffic across:
- ncredir-lb.codfw.wikimedia.org
- ncredir-lb.eqiad.wikimedia.org
Logging
ncredir sends access logs in syslog format to a local port, so Benthos can process them and report metrics. We don't store access logs anywhere, so to view the real-time logs, run:
tcpdump udp port 1221 -A -i lo
See also
- ncmonitor : Automatic updates to ncredir's redirection list
- fifo-log-demux repository : Used for nginx log reading