You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Incident documentation/2021-03-14 Mediawiki API"

From Wikitech-static
Jump to navigation Jump to search
imported>Wolfgang Kandek
 
imported>Krinkle
 
Line 12: Line 12:


==Actionables==
==Actionables==
*Improve traceability of commons queries: [[phab:T277485|T277485]]
*Improve traceability of commons queries: [[phab:T193050|T193050]] (filed in 2018)

Latest revision as of 02:06, 18 September 2021

document status: draft

Summary

On March 14 2021 the MediaWiki API were overloaded and ran out of php-fpm processes. This caused an API outage on all API servers from 17:00 to 17:26 UTC. The root cause of the outage were queries against commons that caused database s4 on server db1144 to be overloaded. Db1144 also serves queries to contributions, recentchanges, watchlist and other MediaWiki features. Task: T277417

File:Php response time.pngFile:Db processlist on db1144.png

https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?viewPanel=46&orgId=1&from=1615734448378&to=1615746774986&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-method=GET&var-code=200

https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=37&orgId=1&var-server=db1144&var-port=13314&from=1615738747603&to=1615745074190

The queries against commons were analyzed for inefficiencies but seem to be well written and optimized SQL. See task: T277416

Actionables

  • Improve traceability of commons queries: T193050 (filed in 2018)