You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
document status: final
Roll out of wmf.8 to group1 broke the world.
Initial indicators of the issue were picked up in logstash and via logspam-watch on mwlog1001. A large number of Icinga alerts followed.
It seems likely that the primary issue was obscured during the initial deploy by a focus on Parsoid errors.
All times in UTC.
- 20:12 brennen: Train wmf.8 roll fowards from group0 to group1 as well (try 1) 
- 20:12 Large amounts of logspam noticed, especially from Parsoid/PHP, and Icinga issues many alerts.
- 20:28 brennen: Train wmf.8 rolled back to just group0 
[Fixes to exclude Parsoid/PHP]
- 23:30 brennen: Train wmf.8 roll fowards from group0 to group1 as well (try 2) 
- 23:30 OUTAGE BEGINS
- 23:30 Large spike in database errors in logstash (T239877), shortly thereafter large amounts of Icinga alerts go off.
- 23:30+ Production group1 and group2 wikis become noticably sluggish, eventually stopping working entirely.
- 23:35 brennen: Attempted train wmf.8 roll back thwarted by canary failures 
- 23:38 brennen: Train wmf.8 rolled back to just group0, again 
- 23:38 OUTAGE ENDS