You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
HTTP proxy: Difference between revisions
imported>Alexandros Kosiaris No edit summary |
imported>Bearloga (→Internal endpoints: Link to notes with examples in R & Python) |
||
Line 45: | Line 45: | ||
=== API === | === API === | ||
Use e.g. <code><nowiki>https://api-ro.discovery.wmnet</nowiki></code> and set the HTTP Host header to the domain of the site you want to access, e.g. <code>curl -H "Host: www.wikidata.org" <nowiki>https://api-ro.discovery.wmnet</nowiki></code> | Use e.g. <code><nowiki>https://api-ro.discovery.wmnet</nowiki></code> and set the HTTP Host header to the domain of the site you want to access, e.g. <code>curl -H "Host: www.wikidata.org" <nowiki>https://api-ro.discovery.wmnet</nowiki></code> | ||
For examples in Python and R refer to [[meta:User:MPopov (WMF)/Notes/Internal API requests|these notes]]. | |||
=== ORES === | === ORES === |
Latest revision as of 21:04, 17 May 2022
To allow HTTP requests reach the outside world, we maintain a caching HTTP proxy in each datacenter. They are exposed using services entries of the form webproxy.<datacenter>.wmnet
.
How-to?
Service Name | Server | port |
---|---|---|
webproxy.eqiad.wmnet | install1003.wikimedia.org | 8080 |
webproxy.codfw.wmnet | install2003.wikimedia.org | 8080 |
webproxy.esams.wmnet | install3001.wikimedia.org | 8080 |
webproxy.ulsfo.wmnet | install4001.wikimedia.org | 8080 |
webproxy.eqsin.wmnet | install5001.wikimedia.org | 8080 |
webproxy.drmrs.wmnet | install6001.wikimedia.org | 8080 |
You can set the http_proxy
and https_proxy
environment variables to make many command-line scripts use the site specific proxy automatically.
export http_proxy=http://webproxy:8080
export https_proxy=http://webproxy:8080
export HTTP_PROXY=http://webproxy:8080
export HTTPS_PROXY=http://webproxy:8080
export no_proxy=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org,.wikibooks.org,.wikiquote.org,.wiktionary.org,.wikisource.org,.wikispecies.org,.wikiversity.org,.wikidata.org,.mediawiki.org,.wikinews.org,.wikivoyage.org
export NO_PROXY=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org,.wikibooks.org,.wikiquote.org,.wiktionary.org,.wikisource.org,.wikispecies.org,.wikiversity.org,.wikidata.org,.mediawiki.org,.wikinews.org,.wikivoyage.org
- "no_proxy" MUST be explicitly set
- Prevents unnecessary load on the proxies (to fetch internal resources)
- Prevents stale data cached on the proxies
- Prevents unnecessary dependencies
- HTTP proxies SHOULD NOT be configured by default, but on a case by case (need) basis
- It's preferred to set these variables for your current session only by running the same commands at the terminal prompt
- services should leverage Puppet to configure proxies
- If the alternatives are not possible, add these lines to your
~/.profile
file
- These proxies MUST NOT be used from Cloud VPS instances (enforced by ACLs)
Internal endpoints
It is better to use internal endpoints instead of public ones, a list or reasons is visible on this comment.
API
Use e.g. https://api-ro.discovery.wmnet
and set the HTTP Host header to the domain of the site you want to access, e.g. curl -H "Host: www.wikidata.org" https://api-ro.discovery.wmnet
For examples in Python and R refer to these notes.
ORES
Similar to above, but use https://ores.discovery.wmnet
A complete list exists at: https://config-master.wikimedia.org/discovery/discovery-basic.yaml
Example usage
curl
If you are using curl, you can use the --proxy flag:
curl --proxy http://webproxy.eqiad.wmnet:8080 http://www.google.com
Monitoring
Access log dashboard: https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2
Requests: https://grafana.wikimedia.org/d/i5YA-BXWz/squid
Future/possible improvements
- Helper script to correctly configure the proxies for the current user session - T278315 - global http_proxy setting
- Centrally managed global no_proxy settings - T278315 - global http_proxy setting
- Maybe restrict domains accessible by webproxy
- Improve proxies redundancy - T242715
- Merge url-downloader and the web-proxies?
Reference
See also
- url-downloader (another set of squid proxies for slightly different use cases)
- T254011: Why do we have 2 sets of squid proxies?