Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Ncredir

From Wikitech

Ncredir is our Non-Canonical Redirect service. It handles traffic from the outside world to a list of domain names which we own but are not primary canonical domain names for our projects. A key example would be wikipedia.com .

It is separate from our primary edge traffic clusters for the canonical domains, running on independent instances and public IPs. Under normal conditions, it gets little traffic.

Types of domains

Generally, ncredir domains fall under a few categories and have expected redirection strategies (per WMF's Legal team):

  1. Paid editing services: Redirect to WMF's article informing users not to pay for domains .
  2. Mispellings: Some are seized typosquatting domains, some are courtesy domains from common misspellings. Redirect to the closest official service. For example, dewikipedia.org would redirect to de.wikipedia.org.
  3. General redirection: Old domains that have new homes, shortener domains, etc.

Some example redirections:

  • Language code redirects: wikipedia.br br.wikipedia.org
  • Scam domains:
# Project typo
funnel	*wikiipedia.org	https://www.wikipedia.org

# Project typo: "biz" means commercial, so redirect to wikimedia's .com instead of .org
funnel	*wikimedia.biz	https://wikimedia.com

# TLD variation
funnel	*wikimedia.community	https://www.wikimedia.org

# Country TLD → Project language code subdomain
rewrite	*wikibooks.gr	https://el.wikibooks.org

# Country TLD (Greece)→ Language code (el) as query param
rewrite	*mediawiki.gr	https://www.mediawiki.org/?uselang=el
funnel	*wikidata.ro	https://www.wikidata.org?uselang=ro

# Scam domain
funnel	scam-domain.example.com	https://wikimediafoundation.org/news/2018/08/22/dont-pay-for-wikipedia-articles/

If in doubt, refer to prior art entries in the data file!

In short, redirect seized edit-for-pay domains to the WMF blog post and all other domains to their closest-sounding counterparts.

Components

ncredir is implemented using a combination of acme-chief managed certificates and nginx. nginx redirects are created in the Puppet repository according to the rules laid out in the nc_redirects.dat file . nc_redirects.dat is used as input for the the custom compile_redirects() function in Puppet which outputs redirects as nginx configuration.

Redirection logic

Nginx is fed with two maps containing the redirection logic. The first map populates a variable called $override , and the second one a variable called $rewrite .

The first map populating $override is generated with the override stanzas contained in the redirects definition file, while the $rewrite map is populated with the funnel and rewrite stanzas from the definition file.

This mapping between the nc_redirects.dat file and nginx happens on puppet compilation time. So in the ncredir servers only nginx + the acme-chief managed certs are needed to run the service.

TLS

ncredir does not utilize the CDN clusters; It handles its own TLS termination. ncredir exposes itself to live traffic using the high-traffic1 class in LVS . The service is geographically balanced via GeoDNS with the ncredir-lb.wikimedia.org record which balances the traffic across:

  • ncredir-lb.codfw.wikimedia.org
  • ncredir-lb.eqiad.wikimedia.org

Logging

ncredir sends access logs in syslog format to a local port, so Benthos can process them and report metrics. We don't store access logs anywhere, so to view the real-time logs, run:

tcpdump udp port 1221 -A -i lo

See also