You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Incident documentation/20170222-www-portals: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Chad
m (Real bullet points)
 
imported>Legoktm
(→‎Conclusions: fix typo)
Line 2: Line 2:
At about 17:00 UTC Feb. 22 the www.wikipedia.org page was severely broken for about an hour.   
At about 17:00 UTC Feb. 22 the www.wikipedia.org page was severely broken for about an hour.   


The text on the page was invisible. This bug was caused by a javascript file being improperly cached and returning a 404.  
The text on the page was invisible. This bug was caused by a JavaScript file being improperly cached and returning a 404.  


== Timeline ==
== Timeline ==
Line 8: Line 8:
* We were made aware of this bug at about 17:40 UTC  
* We were made aware of this bug at about 17:40 UTC  
* at 18:15 UTC an attempt was made to rollback to the previous deploy. The deploy was visible on mwdebug1002 without error, but the error persisted in production.   
* at 18:15 UTC an attempt was made to rollback to the previous deploy. The deploy was visible on mwdebug1002 without error, but the error persisted in production.   
* at 18:20 UTC we purged the URL of the specific javascript file, fixing the issue.  
* at 18:20 UTC we purged the URL of the specific JavaScript file, fixing the issue.  


== Conclusions ==
== Conclusions ==
* The wikipedia.org portal depends on a specific order of syncing followed by purging urls, which is fragile and needs some rethinking.   
* The wikipedia.org portal depends on a specific order of syncing followed by purging urls, which is fragile and needs some rethinking.   
* Errors in javascipt should not make the page unusable.  
* Errors in JavaScript should not make the page unusable.  


== Actionables ==
== Actionables ==
<onlyinclude>
<onlyinclude>
* Adding an entire list of asset URLs to purge ({{PhabT|158810}})
* Adding an entire list of asset URLs to purge ({{PhabT|158810}})
* Preventing javascript from hiding page content indefinitetly ({{PhabT|158809}})
* Preventing JavaScript from hiding page content indefinitetly ({{PhabT|158809}})
* Use query params for cache-busting ({{PhabT|158808}})
* Use query params for cache-busting ({{PhabT|158808}})
</onlyinclude>
</onlyinclude>


[[Category:Incident documentation]]
[[Category:Incident documentation]]

Revision as of 00:13, 25 February 2017

Summary

At about 17:00 UTC Feb. 22 the www.wikipedia.org page was severely broken for about an hour.

The text on the page was invisible. This bug was caused by a JavaScript file being improperly cached and returning a 404.

Timeline

  • A bug was filed at around 17:09 UTC Feb.22 noting that the text on www.wikipedia.org is invisible. task T158782
  • We were made aware of this bug at about 17:40 UTC
  • at 18:15 UTC an attempt was made to rollback to the previous deploy. The deploy was visible on mwdebug1002 without error, but the error persisted in production.
  • at 18:20 UTC we purged the URL of the specific JavaScript file, fixing the issue.

Conclusions

  • The wikipedia.org portal depends on a specific order of syncing followed by purging urls, which is fragile and needs some rethinking.
  • Errors in JavaScript should not make the page unusable.

Actionables

  • Adding an entire list of asset URLs to purge (task T158810)
  • Preventing JavaScript from hiding page content indefinitetly (task T158809)
  • Use query params for cache-busting (task T158808)