You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Operating system upgrade policy"

From Wikitech
Jump to navigation Jump to search
imported>BryanDavis
(→‎Policy proposal: update with Buster's official release date of 2019-07-06 (https://www.debian.org/News/2019/20190706))
imported>Lucas Werkmeister (WMDE)
m (→‎Debian release cadence: update Buster release date)
 
Line 26: Line 26:
 
* Debian 8.0 (''"jessie"''): Apr 2015
 
* Debian 8.0 (''"jessie"''): Apr 2015
 
* Debian 9.0 (''"stretch"''): Jun 2017
 
* Debian 9.0 (''"stretch"''): Jun 2017
* Debian 10.0 (''"buster"''): expected mid-2019
+
* Debian 10.0 (''"buster"''): Jul 2019
  
 
As such, for the rest of the document it’s assumed that a new release happens every two years.  
 
As such, for the rest of the document it’s assumed that a new release happens every two years.  

Latest revision as of 08:14, 12 September 2019

This document proposes a policy for Linux distribution updates for the Wikimedia production cluster and related infrastructure. This process is currently not clearly defined and streamlining it would reduce technical debt and allow the Wikimedia Foundation to benefit from technical innovations quicker.

Problem statement

There’s a number of reasons why we need to upgrade our software stack:

  • To ensure the security of the site and our users’ data, we need to keep the infrastructure free of known security vulnerabilities. In contrast to scheduled maintenance updates, these updates cannot be planned ahead and happen based on when security issues are found and disclosed. There are two types of security updates we apply; the majority are updates provided by the Linux distributions, with some updates from specific vendors not present in distributions and internally prepared updates (usually for software we run modified from the upstream version or which is packaged internally)
  • For our software deployments we also want to benefit from ongoing development trends and provide new features for our users. As an example, a newer release of the OpenSSL crypto library could enhance the support for recent versions of the Transport Layer Security (TLS) standard or support the latest advancements in cryptographic ciphers.
  • Using current releases is also relevant for our hardware support. Eventually our hardware vendors stop shipping the server models we use. In case of a hardware refresh we can be faced with updated server components (e.g. network cards) which are only supported in a more recent version of the operating system.
  • For some services, our deployments are some of the largest of a given software, so our experiences and feedback of operating the software at scale are valuable to the upstream maintainers and their users. For meaningful feedback it's important that our stack doesn't date back too far from current releases. This is of course always a balance, using software from a Linux distribution which stabilises towards a stable release always induces an unavoidable delay compared to running the most recent version of a software component.

Supporting older distribution releases comes at a significant, albeit not very visible, internal cost resulting in technical debt. The more releases of a Linux distribution we need to support, the more effort is spent on making changes compatible to all supported distributions. Much of that effort is also independent of the number of systems still running a given release. This is because all core changes still need to be adapted for the few remaining servers. Some random examples are services which could not simply ship a configuration unit for systemd, as some systems were not yet using systemd or internally maintained software components that need to be built/maintained for several distribution releases at once, like our configuration management system (Puppet). Supporting fewer distributions frees up engineering resources for improving our infrastructure elsewhere.

When this policy was published (March 2019) we were supporting four different Linux distributions (Ubuntu 14.04, Debian 8, Debian 9 and work in progress for Debian 10).

That level of technical debt not only applies to our maintained packages, but also extends to our git repository which stores the Puppet configuration data for our systems. Some of these configuration settings also affect the Puppet code, so we need to retain backwards compatibility here.

Background

Debian release cadence

We use Debian as the operating system to run the Wikimedia production servers and the services comprising Wikimedia Cloud Services. In contrast to other distributions, Debian doesn’t set fixed release dates, but rather postpones a release until it’s considered “ready”. This may sound hard to plan for, but over the last decade they're mostly following a two year cadence with a variance of a few months:

  • Debian 4.0 ("etch"): Apr 2007
  • Debian 5.0 ("lenny"): Feb 2009
  • Debian 6.0 ("squeeze"): Feb 2011
  • Debian 7.0 ("wheezy"): May 2013
  • Debian 8.0 ("jessie"): Apr 2015
  • Debian 9.0 ("stretch"): Jun 2017
  • Debian 10.0 ("buster"): Jul 2019

As such, for the rest of the document it’s assumed that a new release happens every two years.

Historically the infrastructure of the Wikimedia Foundation ran several releases of Ubuntu but Wikimedia support for Ubuntu is now deprecated. The remaining Ubuntu hosts are to be migrated by April 2019. This document covers our new setup with only Debian installations.

Support stages for Debian releases

After a release has happened, Debian follows a pattern of support levels similar to other Linux distributions (e.g. Red Hat Enterprise Linux). For the first years of support a mix of functional and security fixes are backported, while at later stages only security fixes are shipped.

Once released, support for Debian happens in two (three) stages:

  • For the lifetime of a stable release plus one year after the release of the subsequent one (so effectively three years), there's security support provided by Debian itself. In addition to security updates there are also point releases every few months which collect bug fixes and ship minor security fixes which are not important enough for a regular security update. These point releases can also provide support for new hardware drivers.
  • After the three year support period, the remaining time frame until five years after the initial release date (so around two years) is covered by security updates. This support is provided by the Debian LTS project, where paid contributors provide security updates. Compared to the standard security support in the first three years this usually covers fewer packages (but the omitted packages don't matter that much for our server setups). There are no bugfix updates for LTS, support is limited to security fixes (with a few critical exceptions like time zone updates). The support in LTS is also inherently a little degraded over the standard support, e.g. some packages cannot be backported after more than X years (e.g. Oracle withholds vulnerability information for their products, so MySQL 5.5 cannot be supported any longer in Debian 8 LTS as Oracle stopped supporting 5.5). In addition, for LTS there’s no longer support for the backports suite. This suite provides updated packages originally not included in a stable release and sometimes our software setups rely on components from this suite.
  • There's even a third stage with extended LTS support (extending the lifetime even longer than five years). It doesn't cover the complete archive, but only selected packages for some companies paying for the support.

Timeline of previous distribution deprecations

Historically, older distributions have been phased out very close to the support termination of the respective distribution releases (or even after the target date in one case):

  • For Ubuntu 10.04 Lucid, one system was not migrated in time
  • For Ubuntu 12.04 Precise, the last system was migrated three weeks before the end-of-life date
  • For Ubuntu 14.04 Trusty, the removal of the last systems will only happen shortly prior to the end-of-life date (per current planning/estimation)

Policy proposal

The proposal is to limit the use of a Debian release to four years, in other words to two Debian releases at a time.[1]

  • For the first three years after it becomes available, a distribution release can get deployed arbitrarily but with the availability of a new release it’s strongly advised to use the newer early on.
  • Once that three year period passes, the migration of the remaining installed base is centrally coordinated. This provides stakeholders with a full year to migrate existing hosts and services:
    • For the servers in the Wikimedia production cluster, this work is coordinated by the SRE Infrastructure Foundations team. The actual migration would be owned by the respective service owners within the SRE teams.
    • For servers managed by other stakeholders (most notably the Wikimedia Cloud Services team for Wikimedia Cloud VPS and Toolforge, and anyone building container/Docker images based on the Wikimedia package repository) the migration is to be organized by their respective teams.
  • After four years, support for the old distribution is ended within the Wikimedia infrastructure and removed from Puppet trees, package repositories and related configuration settings.

The proposal is to enable this policy retroactively for Stretch, meaning it could be used until June 2021. The following chart displays the stages for future releases:

<timeline>

Define $begin = 01/01/2014 Define $now = 19/09/2019 Define $end = 19/09/2029 Define $width = 800 Define $warning = 570 # $width - 230 Define $height = 506 ImageSize = width:$width height:auto barincrement:30 PlotArea = right:10 left:50 bottom:100 top:10 DateFormat = dd/mm/yyyy Period = from:$begin till:$end TimeAxis = orientation:horizontal Legend = orientation:vertical position:bottom columns:1

Colors =

    id:bg              value:white
    id:lightline       value:rgb(0.9, 0.9, 0.9)
    id:todayline       value:rgb(0.9, 0.2, 0.2)
    id:lighttext       value:rgb(0.5, 0.5, 0.5)
    id:Freeze          value:rgb(0.8, 0.8, 0.8)      Legend:Freeze
    id:Stable          value:rgb(0.5, 0.8, 0.5)      Legend:Stable
    id:Deprecate       value:rgb(1  , 0.6, 0.0)      Legend:Deprecate

BackgroundColors = canvas:bg ScaleMajor = gridcolor:lightline unit:year increment:1 start:$begin

  1. start the text at the right (align) of the end of the bar (anchor).
  2. also 1px of margin in between (shift-x), and vertically centred (shift-y)

Define $texttoright = fontsize:M textcolor:black anchor:till align:left shift:(1,-5)

LineData=

  at:$now color:todayline width:0.1

PlotData=

 bar:11.0 width:20
   color:Freeze mark:(line,white)
   from:01/01/2021 till:01/06/2021
   color:Stable mark:(line,white)
   from:01/06/2021 till:01/06/2024
   color:Deprecate mark:(line,white)
   from:01/06/2024 till:01/06/2025 $texttoright text:"11.x Bullseye"
   
 bar:10.0 width:20
   color:Freeze mark:(line,white)
   from:21/01/2019 till:06/07/2019
   color:Stable mark:(line,white)
   from:06/07/2019 till:01/06/2022
   color:Deprecate mark:(line,white)
   from:01/06/2022 till:01/06/2023 $texttoright text:"10.x Buster"
 
 bar:9.0 width:20
   color:Stable mark:(line,white)
   from:01/06/2017 till:01/06/2020
   color:Deprecate mark:(line,white)
   from:01/06/2020 till:01/06/2021 $texttoright text:"9.x Stretch"
 bar:8.0 width:20
   color:Stable mark:(line,white)
   from:01/04/2015 till:01/04/2018
   color:Deprecate mark:(line,white)
   from:01/04/2018 till:01/04/2020 $texttoright text:"8.x Jessie"
 bar:14.04 width:20
   color:Stable mark:(line,white)
   from:01/04/2014 till:01/04/2018
   color:Deprecate mark:(line,white)
   from:01/04/2018 till:01/04/2019 $texttoright text:"14.04 Trusty"
   

TextData =

 fontsize:M
 textcolor:lighttext
 pos:($warning,50)
 text:Last refreshed 2019-09-19
</timeline>

For the phase-out of Debian Jessie a date will be coordinated within the SRE teams (at this point less than 200 Jessie systems are running in production).

Footnotes

  1. There will be an overlap of a few months prior to the release of a new stable release where the next distribution is internally prepared/tested for our infrastructure, but this can be ignored for the purpose of this policy