You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

DNS/Discovery

From Wikitech-static
< DNS
Revision as of 16:22, 6 April 2017 by imported>Volans (Add images)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DNS Discovery is a simple dynamic service discovery to get the closest active endpoint of a given service that is running in multiple datacenters.

This solution is meant only for simple discovery entries, if more complex data needs to be dynamically driven, the usage of a Confd / etcd managed configuration is required.

Active/active services

If a service is running in an active/active mode, it means that can be contacted in any datacenter. In this case the entry service-name.discovery.wmnet will return the IP of the endpoint of the same datacenter of the host that is performing the resolution, if that endpoint is pooled.

So for example with both datacenters pooled, an host in eqiad that will resolve service-name.discovery.wmnet will get the IP of service-name.svc.eqiad.wmnet while an host in codfw will get the IP of service-name.svc.codfw.wmnet.

If the codfw datacenter entry is depooled, an host in codfw will get the IP of the endpoint in eqiad, if that is pooled.

Dns-discovery active-active.png

Active/passive services

If a service is running in an active/passive mode, it means that it can be contacted only in the primary datacenter and not in the passive one. In this case the entry service-name.discovery.wmnet will always return the IP of the endpoint in the primary datacenter.

Dns-discovery active-passive.png

Read-only and read-write

If a service can handle reads in an active/active way, but writes only in an active/passive way, two DNS Discovery records can be created, service-name-ro and service-name-rw so they can be treated as two different services, one active/active and the other active/passive.

Failure scenario

To handle the failure cases in which no datacenter is pooled for a given service, a failoid service was created that will always close the connection to any TCP port. In this way the DNS Discovery can have the failod IPs as fallback and is able to return always an IP, avoiding any negative DNS caching and such. The failoid service is present in both eqiad and codfw datacenters and the IP of the local one will be returned.

How to manage a DNS Discovery service

TODO: Add more details for the Puppet configuration

The DNS configuration is managed in Puppet while the current pooled/depooled state and the TTL are stored in etcd and can be managed via Conftool, either from the CLI or using it as a library. For example:

  • Get the current live state of the three main MediaWiki discovery entries:
$ confctl --quiet --object-type discovery select 'dnsdisc=(appservers|api|imagescaler)-rw' get
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"}
{"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"}
{"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"}
{"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"}
  • Get the current live state of the parsoid entry:
$ confctl --quiet --object-type discovery select 'dnsdisc=parsoid' get
{"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=parsoid"}
{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=parsoid"}
  • Depool the codfw entry of the imagescaler-ro entry in codfw:
$ confctl --object-type discovery select 'dnsdisc=imagescaler-ro,name=codfw' set/pooled=false