You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Incidents/2019-06-03 eqiad-port-saturation
< Incidents(Redirected from Incidents/20190603-eqiad-port-saturation)
Jump to navigation
Jump to search
Summary
We received a LibreNMS alert for port utilization over 80% on cr2-eqiad.wikimedia.org in xe-3/3/3 interface.
After some investigation, we identified a single client running in AWS performing multiple requests for media content using "User-Agent: python-requests".
Impact
No user-facing impact.
Detection
LibreNMS alert on #-operations
Timeline
(In UTC)
- 12:23: <librenms-wmf> Critical Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80%
- 13:22: faidon@weblog1001:/srv/log/webrequest$ head -n 2560000 sampled-1000.json | jq -r '.ip + " " + (.response_size | tostring)' | awk '{ sum[$1] += $2 } END { for (ip in sum) print sum[ip],ip }' |sort -nr | head -10
- 13:35: cdanis opens abuse report with AWS abuse team
- 13:50: crawler begins scraping larger objects again, network usage increases to max
- 14:00: ema merges https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/514005/ to block their User-Agent (also unavoidably blocking some legitimate traffic)
- 14:01: ema runs puppet on cp1084
- 14:05: outbound network returns to normal
Actionables
- phab:T224884 - Rate limit requests to cache_upload
- phab:T224888 - Network port saturation should page
- phab:T224891 - Return HTTP 403 to requests violating User-Agent policy
- Begin enforcing our existing meta:User-Agent policy, after notifying community (TODO: file task). Some summary of discussion:
- Ratelimit/block 'default' UAs, like "curl", "python-requests", "python-urllib2", etc.
- Probably allow only so many reqs/sec from a given IP for these 'default' ones
- First investigate how much traffic this would hurt
- wikitech-l@, mediawiki-api-announce@, and m:Tech/News are reasonable venues for announcements
- Ratelimit/block 'default' UAs, like "curl", "python-requests", "python-urllib2", etc.