You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

HTTP proxy

From Wikitech-static
(Redirected from Http proxy)
Jump to navigation Jump to search

To allow HTTP requests reach the outside world, we maintain a caching HTTP proxy in each datacenter. They are exposed using services entries of the form webproxy.<datacenter>.wmnet.

How-to?

Service Name Server port
webproxy.eqiad.wmnet install1003.wikimedia.org 8080
webproxy.codfw.wmnet install2003.wikimedia.org 8080
webproxy.esams.wmnet install3001.wikimedia.org 8080
webproxy.ulsfo.wmnet install4001.wikimedia.org 8080
webproxy.eqsin.wmnet install5001.wikimedia.org 8080
webproxy.drmrs.wmnet install6001.wikimedia.org 8080

You can set the http_proxy and https_proxy environment variables to make many command-line scripts use the site specific proxy automatically.

export http_proxy=http://webproxy:8080
export https_proxy=http://webproxy:8080
no_proxy=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org,.wikibooks.org,.wikiquote.org,.wiktionary.org,.wikisource.org,.wikispecies.org,.wikiversity.org,.wikidata.org,.mediawiki.org,.wikinews.org,.wikivoyage.org
export HTTP_PROXY=$http_proxy
export HTTPS_PROXY=$https_proxy
export NO_PROXY=$no_proxy
  • "no_proxy" MUST be explicitly set
    • Prevents unnecessary load on the proxies (to fetch internal resources)
    • Prevents stale data cached on the proxies
    • Prevents unnecessary dependencies
  • HTTP proxies SHOULD NOT be configured by default, but on a case by case (need) basis
    • It's preferred to set these variables for your current session only by running the same commands at the terminal prompt
    • services should leverage Puppet to configure proxies
    • If the alternatives are not possible, add these lines to your ~/.profile file
  • These proxies MUST NOT be used from Cloud VPS instances (enforced by ACLs)

Internal endpoints

It is better to use internal endpoints instead of public ones, a list or reasons is visible on this comment.

API

Use e.g. https://api-ro.discovery.wmnet and set the HTTP Host header to the domain of the site you want to access, e.g. curl -H "Host: www.wikidata.org" https://api-ro.discovery.wmnet

For examples in Python and R refer to these notes.

ORES

Similar to above, but use https://ores.discovery.wmnet


A complete list exists at: https://config-master.wikimedia.org/discovery/discovery-basic.yaml

Example usage

curl

If you are using curl, you can use the --proxy flag:

curl --proxy http://webproxy.eqiad.wmnet:8080 http://www.google.com

wget

wget has no --proxy flag, set the appropriate environment variable instead.

https_proxy=http://webproxy:8080 wget https://www.google.com

Maven proxy configuration example

You could reference your proxy in your maven conf file ~/.m2/settings.xml to make sure you are passing through it to fetch packages at build time.

<settings>
  <proxies>
    <proxy>
      <id>http-proxy</id>
      <active>true</active>
      <protocol>http</protocol>
      <host>webproxy.eqiad.wmnet</host>
      <port>8080</port>
    </proxy>
    <proxy>
      <id>https-proxy</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>webproxy.eqiad.wmnet</host>
      <port>8080</port>
    </proxy>
  </proxies>
</settings>

Monitoring

Access log dashboard: https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2

Requests: https://grafana.wikimedia.org/d/i5YA-BXWz/squid

Future/possible improvements

Reference

See also