You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Deployments/Emergencies"

From Wikitech-static
Jump to navigation Jump to search
imported>Aklapper
(s/Freenode/libera.chat/)
imported>Thcipriani
 
Line 1: Line 1:
{{Note|This page is about conducting emergency deployments. If you're looking for help with an emergency situation resulting from a prior or ongoing deployment, please first try to contact Release Engineering & SRE on libera.chat in {{irc|wikimedia-operations}}. If that fails, it may be appropriate to use [[Klaxon]].}}
{{Note|If you're looking for help with an emergency situation, please first try to contact Release Engineering & SRE on libera.chat in {{irc|wikimedia-operations}}. If that fails, it may be appropriate to use [[Klaxon]].}}
{{Navigation MediaWiki deployment}}
{{Navigation MediaWiki deployment}}
Sometimes, something is broken in production, and it needs fixing ''right now'', even though deployments aren't happening right now. Maybe there's a deployment freeze on, or it's at night, over a weekend or holiday, or the deployment train is broken. In these cases, you need to make an '''emergency deployment'''.


To do so, you should check in IRC and get positive confirmation from '''both''' Release Engineering and SRE, and perhaps other specialist teams as well (such as Security) and carry out the deployment with them.
'''Emergency deployments''' happen when things need fixing ''right now'', even though deployments aren't happening right now.  


General advice:
== How to ==
* '''Rollback first, fix later'''; maintaining an overall service to our users is the most important focus.
 
* '''Prioritise general availability''' over that of new features; we have a billion readers and only a few users of your new tool, no matter how cool.  
To do an emergency release, join IRC and get positive confirmation from '''both''' Release Engineering and SRE, and perhaps other specialist teams as well (such as Security) and carry out the deployment with them.
* '''Make on-wiki edits rarely''', and only when you really have to; each wiki's editing community expects autonomy.
 
<span id=step-by-step></span>''' 🚨 Step-by-step'''
* Join {{irc|wikimedia-operations}} on libera.chat
* Message {{ircnick|thcipriani|Tyler}} and/or {{ircnick|greg|Greg}} and/or the [https://train-blockers.toolforge.org person assigned to this week's train] (see the [[{{FULLPAGENAME}}#irc-template|template]] below)
* Include a link to the patch you'd like to deploy (and the task if appropriate)
* We'll make sure that someone can help you deploy and that you're clear to deploy
 
<span id=irc-template></span>'''IRC message Template'''
<pre>thcipriani: greg-g [FILL IN TRAIN CONDUCTOR] help! I'd like to do an emergency deploy for https://gerrit.wikimedia.org/r/1234 -- context is T1234</pre>
 
== Reasons for an emergency deploy ==


Example reasons we have needed an emergency release:
* Address security issues
* Address security issues
*: For example, a mis-configuration once meant that a private wiki and all of its content was accidentally made public.
*: For example, a mis-configuration once meant that a private wiki and all of its content was accidentally made public.
Line 21: Line 29:
* Major loss of functionality / appearance
* Major loss of functionality / appearance
*: For example, a code efficiency change broke the visual appearance and usability of parts the sites for a large number of logged-out users, and so the change was reverted out of production until it could be fixed.
*: For example, a code efficiency change broke the visual appearance and usability of parts the sites for a large number of logged-out users, and so the change was reverted out of production until it could be fixed.
== For deployers ==
* '''Rollback first, fix later'''; maintaining an overall service to our users is the most important focus.
* '''Prioritise general availability''' over that of new features; we have a billion readers and only a few users of your new tool, no matter how cool.
* '''Make on-wiki edits rarely''', and only when you really have to; each wiki's editing community expects autonomy.


[[Category:Deployment]]
[[Category:Deployment]]
[[Category:Operations policies]]
[[Category:Operations policies]]

Latest revision as of 15:37, 11 June 2021

Deployments

Emergency deployments happen when things need fixing right now, even though deployments aren't happening right now.

How to

To do an emergency release, join IRC and get positive confirmation from both Release Engineering and SRE, and perhaps other specialist teams as well (such as Security) and carry out the deployment with them.

🚨 Step-by-step

IRC message Template

thcipriani: greg-g [FILL IN TRAIN CONDUCTOR] help! I'd like to do an emergency deploy for https://gerrit.wikimedia.org/r/1234 -- context is T1234

Reasons for an emergency deploy

  • Address security issues
    For example, a mis-configuration once meant that a private wiki and all of its content was accidentally made public.
  • Avoid data loss / corruption
    For example, a coding error meant that newly-painted pages were being cached in a corrupted form; the longer it went, the more of the site was wrong.
  • Maintain availability
    For example, a new feature proved much more popular than planned and the extra load it was causing was threatening to take down the site, so it was temporarily disabled over a holiday, until people were back at work.
  • Prevent abuse
    For example, a massive content scraping run from a search engine wasn't responding to automated HTTP 429 speed bumps and so had to be manually blocked until they could adjust their code.
  • Major loss of functionality / appearance
    For example, a code efficiency change broke the visual appearance and usability of parts the sites for a large number of logged-out users, and so the change was reverted out of production until it could be fixed.

For deployers

  • Rollback first, fix later; maintaining an overall service to our users is the most important focus.
  • Prioritise general availability over that of new features; we have a billion readers and only a few users of your new tool, no matter how cool.
  • Make on-wiki edits rarely, and only when you really have to; each wiki's editing community expects autonomy.