You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Deployments/Holding the train: Difference between revisions
imported>Greg Grossmeier (TOC) |
imported>20after4 (logspam) |
||
Line 19: | Line 19: | ||
* Large error-rate increases | * Large error-rate increases | ||
** Even <code>INFO</code>-level errors are subject to this as they make logs unusable | ** Even <code>INFO</code>-level errors are subject to this as they make logs unusable | ||
** See [[Mw:Wikimedia Release Engineering Team/Logspam|Logspam]] | |||
== What happens in SWAT while the train is on hold? == | == What happens in SWAT while the train is on hold? == |
Revision as of 17:42, 13 February 2017
This page is currently a draft. More information and discussion about changes to this draft on the talk page. |
Holding the deployment train is not something that should happen unless there are serious security, performance, or functionality issues. Holding the train, counter-intuitively, can create more problems than it solves as the differences between the versions of MediaWiki and extensions that are deployed to the cluster become more widely divergent from the primary development versions of the code.
Issues that hold the train
This is not exhaustive list of things that would cause the train to pause or roll back. As always, it's up to the best judgment of operations and release engineering, but the following scenarios are pretty indicative of what we'd take action on.
- Security issues
- Data loss
- Major feature regressions
- Inability to login/logout/create account for a large portion of users
- Inability to edit for a large portion of users
- Performance regressions
- Page load time
- Page save/update time
- Major stylistic problems affecting all pages
- Large error-rate increases
- Even
INFO
-level errors are subject to this as they make logs unusable - See Logspam
- Even
What happens in SWAT while the train is on hold?
Only simple config changes and emergency fixes are allowed during SWAT while we are reverted. This is to reduce the complexity during investigation.
Remember, while we are reverted people are diligently diagnosing and debugging issues; any seemingly unrelated change could in fact effect their investigations.
What happens next (modified train scheduled)?
- If a new
wmf.XX
version wasn't deployed due to blockers for the entire week then- The following week no new branch will be cut (target getting
wmf.XX
to all wikis) OR The following week a new branch will be cut (skipping last week'swmf.XX
branch) - An incident report will be filed to address follow-up actions and process improvements
- The following week no new branch will be cut (target getting
- If a blocker was found and addressed before 3pm Pacific then
- the planned deploy/rollout can move forward at that time (deployment schedule permitting)
- If there are issues affecting performance discovered after the current version of MediaWiki and extensions has been deployed then
- The current code version will remain on servers—we will not attempt to rollback to a version > 1 week old
- The next release will remain at the Performance Team's discretion until XXX time, after which a new branch will be cut and rolled out
- IF...THEN