You are browsing a read-only backup copy of Wikitech. The primary site can be found at

Incidents/2019-06-13 wdqs

From Wikitech-static
< Incidents
Revision as of 17:47, 8 April 2022 by imported>Krinkle (Krinkle moved page Incident documentation/2019-06-13 wdqs to Incidents/2019-06-13 wdqs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


From June 13 ~15:10UTC to ~15:50 UTC the public WDQS endpoint in eqiad was overloaded by a bot to the point where it was not serving user queries. There is no reason to think that this bot was malicious. To mitigate this, the python-requests user agent is temporarily banned from accessing WDQS, consistent with our user agent policy.


The WDQS public endpoint in eqiad was unavailable from ~15:25 to ~15:45 UTC.

The python-requests user agent is still being banned, we are waiting to implement a more gentle solution before removing this ban.

The internal WDQS endpoint was not impacted.


Problem was detected by the Icinga LVS probe.


All times in UTC.

  • 15:10: load starts to increase on the public wdqs eqiad cluster
  • 15:31: Icinga LVS alert for wdqs.svc.eqiad.wmnet


  • identifying and throttling bots is a hard problem
  • we need to take more drastic action to protect the stability of the service (aggressively throttle generic user agents)

What went well?

  • problem was detected automatically in a timely manner
  • good collaboration and clear communication between

What went poorly?

  • while we do have logic to throttle abusive bots, this throttling was not sufficient to protect the service
  • we are still banning python-requests as a user agent, which affects a number of bots

Where did we get lucky?

  • This happened during SRE offsite, when most SRE are in the same timezone. Luckily this wasn't when all of them were sleeping!