You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Incident documentation/2018-05-22 MediaWiki: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
 
imported>Krinkle
 
Line 1: Line 1:
== Summary ==
#REDIRECT [[Incidents/2018-05-22 MediaWiki]]
MediaWiki issues cause by Translate extension
Symptoms initially looked like db issues or network issues, then like a SWAT patch had caused the issue, but nothing seemed to line up that well.
 
See https://phabricator.wikimedia.org/T195293#4224220
 
== Timeline ==
 
'''Patchs Merged''' (as part of SWAT)
 
* 13:21 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Temp rate limit for arwiki due to mass vandalism T192668 (duration: 01m 18s)
* 13:25 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable $wgUseRCPatrol on azwiki T194389 (duration: 01m 20s)
 
'''meta.wikimedia.org actions'''
 
* 13:26, 22 May 2018 FuzzyBot (talk | contribs) changed the state of Russian translations of Privacy policy from Needs updating to In progress
* 13:28, 22 May 2018‎ Kaganer (talk | contribs)‎ m . . (52,758 bytes) (+3)‎ . . (thank) - https://meta.wikimedia.org/w/index.php?title=Privacy_policy&diff=18066512&oldid=18063747
* 13:29, 22 May 2018 Kaganer (talk | contribs) marked Privacy policy for translation
* 13:32, 22 May 2018 Kaganer (talk | contribs) changed the state of Russian translations of Privacy policy from In progress to Needs updating
 
'''Issue'''
 
* 13:34 paladox@#wikimedia-operations: hmm https://meta.wikimedia.org/wiki/Privacy_policy is not loading for me
* 13:35 NotASpy@#wikimedia-operations: yeah, en.wp is crawling along for me.
* 13:35 addshore@#wikimedia-operations: *looks around*
* 13:35 addshore@#wikimedia-operations: I can see a bunch of db errors
* 13:35 addshore@#wikimedia-operations: spike in lag or issue with replication
* 13:37 https://phabricator.wikimedia.org/T195293 - 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly)
* 13:40 addshore@#wikimedia-operations: started at 13:31
* <discussion>
* 13:43 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
 
'''Reverts'''
 
* 13:44 addshore - Started first revert
* 13:46 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Revert Enable $wgUseRCPatrol on azwiki (duration: 01m 19s)
* 13:47 marostegui@#wikimedia-operations: all the connection errors I am seeing are on s7
* 13:48 addshore@#wikimedia-operations: s7 has ar wiki?
* 13:49 marostegui@#wikimedia-operations: addshore: actually yes
* 13:49 addshore - started second revert
* 13:51 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Revert Revert Temp rate limit for arwiki due to mass vandalism (duration: 01m 18s)
 
'''Recovery'''
 
* 13:52 marostegui@#wikimedia-operations: connections are decreasing on db1094
* 13:52 __joe__@#wikimedia-operations: yes, queues on the appservers are vanishing
* 13:52 volans@#wikimedia-operations: 500s goind down
* 14:11 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
 
== Conclusions ==
TODO
 
== Actionables ==
 
TODO
 
{{#ifeq:{{SUBPAGENAME}}|Report Template||
[[Category:Incident documentation]]
}}

Latest revision as of 17:46, 8 April 2022