You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

HTTP proxy

From Wikitech-static
Revision as of 13:54, 20 April 2022 by imported>Alexandros Kosiaris
Jump to navigation Jump to search

To allow HTTP requests reach the outside world, we maintain a caching HTTP proxy in each datacenter. They are exposed using services entries of the form webproxy.<datacenter>.wmnet.

How-to?

Service Name Server port
webproxy.eqiad.wmnet install1003.wikimedia.org 8080
webproxy.codfw.wmnet install2003.wikimedia.org 8080
webproxy.esams.wmnet install3001.wikimedia.org 8080
webproxy.ulsfo.wmnet install4001.wikimedia.org 8080
webproxy.eqsin.wmnet install5001.wikimedia.org 8080
webproxy.drmrs.wmnet install6001.wikimedia.org 8080

You can set the http_proxy and https_proxy environment variables to make many command-line scripts use the site specific proxy automatically.

export http_proxy=http://webproxy:8080
export https_proxy=http://webproxy:8080
export HTTP_PROXY=http://webproxy:8080
export HTTPS_PROXY=http://webproxy:8080
export no_proxy=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org,.wikibooks.org,.wikiquote.org,.wiktionary.org,.wikisource.org,.wikispecies.org,.wikiversity.org,.wikidata.org,.mediawiki.org,.wikinews.org,.wikivoyage.org
export NO_PROXY=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org,.wikibooks.org,.wikiquote.org,.wiktionary.org,.wikisource.org,.wikispecies.org,.wikiversity.org,.wikidata.org,.mediawiki.org,.wikinews.org,.wikivoyage.org
  • "no_proxy" MUST be explicitly set
    • Prevents unnecessary load on the proxies (to fetch internal resources)
    • Prevents stale data cached on the proxies
    • Prevents unnecessary dependencies
  • HTTP proxies SHOULD NOT be configured by default, but on a case by case (need) basis
    • It's preferred to set these variables for your current session only by running the same commands at the terminal prompt
    • services should leverage Puppet to configure proxies
    • If the alternatives are not possible, add these lines to your ~/.profile file
  • These proxies MUST NOT be used from Cloud VPS instances (enforced by ACLs)

Internal endpoints

It is better to use internal endpoints instead of public ones, a list or reasons is visible on this comment.

API

Use e.g. https://api-ro.discovery.wmnet and set the HTTP Host header to the domain of the site you want to access, e.g. curl -H "Host: www.wikidata.org" https://api-ro.discovery.wmnet

ORES

Similar to above, but use https://ores.discovery.wmnet


A complete list exists at: https://config-master.wikimedia.org/discovery/discovery-basic.yaml

Example usage

curl

If you are using curl, you can use the --proxy flag:

curl --proxy http://webproxy.eqiad.wmnet:8080 http://www.google.com

Monitoring

Access log dashboard: https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2

Requests: https://grafana.wikimedia.org/d/i5YA-BXWz/squid

Future/possible improvements

Reference

See also