You are browsing a read-only backup copy of Wikitech. The live site can be found at

Incident documentation/20160610-ORES: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
(One intermediate revision by one other user not shown)
Line 1: Line 1:
''This is a template for an Incident Report.  Replace notes with your own description.''
#REDIRECT [[Incidents/20160610-ORES]]
== Summary ==
ORES was down for an unknown amount of hours today due to a broken configuration file (<code>99-redis.yaml</code>). 
== Timeline ==
* ??? -- was merged to the ''production'' branch of wikimedia puppet
'' at least 6 hours passes ''
* 2016-06-10 @ 1930 UTC -- 503 errors and timeouts were noted
* 2016-06-10 @ 2030 UTC -- 99-redis.yaml files are deleted and the workers are restarted.  Service is restored.
== Conclusions == should not have been merged.  We need a better testing process around puppet merges to make sure that they don't take down the service.  Unlike a deploy, there's to a clear event at which puppet is run. 
Also, this downtime did not cause a paging event. 
== Actionables ==
* [[Phab:T137592]]
[[Category:Incident documentation]]

Latest revision as of 17:45, 8 April 2022