You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Incident documentation/2021-03-14 Mediawiki API

From Wikitech-static
Jump to navigation Jump to search

document status: draft

Summary

On March 14 2021 the MediaWiki API were overloaded and ran out of php-fpm processes. This caused an API outage on all API servers from 17:00 to 17:26 UTC. The root cause of the outage were queries against commons that caused database s4 on server db1144 to be overloaded. Db1144 also serves queries to contributions, recentchanges, watchlist and other MediaWiki features. Task: T277417

File:Php response time.pngFile:Db processlist on db1144.png

https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?viewPanel=46&orgId=1&from=1615734448378&to=1615746774986&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-method=GET&var-code=200

https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=37&orgId=1&var-server=db1144&var-port=13314&from=1615738747603&to=1615745074190

The queries against commons were analyzed for inefficiencies but seem to be well written and optimized SQL. See task: T277416

Actionables

  • Improve traceability of commons queries: T193050 (filed in 2018)