You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Backport windows: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>MarcoAurelio
(terminology)
 
imported>Aklapper
(s/Freenode/libera.chat/)
(8 intermediate revisions by 7 users not shown)
Line 1: Line 1:
[[File:Putting the prod in production. Production Drive Committee - NARA - 534919.jpg|thumbnail]]
[[File:Putting the prod in production. Production Drive Committee - NARA - 534919.jpg|thumbnail]]
'''Backport windows''' are set times for Wikimedia production [[deployments]], are done by a member(s) of the backport deploy team (see below). They were historically known as '''SWAT deploy windows'''.
{{Navigation MediaWiki deployment}}
'''Backport windows''' are set times for Wikimedia production [[deployments]], are done by a member(s) of the backport deploy team (see below). They were historically known as '''SWAT deploy windows'''. Despite the name, they are used for '''configuration changes to production wikis''' as well as '''backports'''.


The purpose is to provide a known window for people to get bug fixes deployed ahead of the normal cadence (currently, weekly) without having to beg people to do the deployment for them; the team is there to do the deployment part.
The purpose is to provide a known window for people to get bug fixes deployed ahead of the normal cadence (currently, weekly) without having to beg people to do the deployment for them; the team is there to do the deployment part.  Similarly, it is usually advantageous to make configuration changes separate from train deploys so that effects of the reconfiguration can be distinguished from regressions caused by the train; the backport deploy team provides the expertise to safely deploy configuration changes.


The words MUST, CAN, MAY, etc. are to be interpreted in comformity with RFC 2119.
The words MUST, CAN, MAY, etc. are to be interpreted in conformity with [[RFC:2119]].


==Guidelines==
==Guidelines==
Line 10: Line 11:
*There will be '''at least one''' backport window deployer available and active during the window.
*There will be '''at least one''' backport window deployer available and active during the window.
**If no backport team member is around and available to do the deploys the window will be skipped. Please reschedule your patches.
**If no backport team member is around and available to do the deploys the window will be skipped. Please reschedule your patches.
*All those submitting patches for deployment '''MUST''' be in '''{{irc|wikimedia-operations}}''' on Freenode to communicate with the backport team member.
*All those submitting patches for deployment '''MUST''' be in '''{{irc|wikimedia-operations}}''' on libera.chat to communicate with the backport team member.
**The backport team will ping the relevant developers at the start of the window and when theirs is up; they '''MUST''' be available. If they are not available the patch will NOT be deployed.
**The backport team will ping the relevant developers at the start of the window and when theirs is up; they '''MUST''' be available. If they are not available the patch will NOT be deployed.
*All communication '''MUST''' happen in '''{{irc|wikimedia-operations}}''' on Freenode (not in separate team or area-specific channels)
*All communication '''MUST''' happen in '''{{irc|wikimedia-operations}}''' on libera.chat (not in separate team or area-specific channels)
*'''Allowed types of patches''' ''Things not fitting these criteria should instead use the standard deploy window process''
*'''Allowed types of patches''' ''Things not fitting these criteria should instead use the standard deploy window process''
**For <code>mediawiki/core</code> and <code>mediawiki/extensions</code>:
**For <code>mediawiki/core</code> and <code>mediawiki/extensions</code>:
Line 18: Line 19:
*** Backports only (IOW: Everything should already be committed into master and tested on [[mw:Beta Cluster]] and then "backported" to the relevant release branch)
*** Backports only (IOW: Everything should already be committed into master and tested on [[mw:Beta Cluster]] and then "backported" to the relevant release branch)
** For <code>operations/mediawiki-config</code>
** For <code>operations/mediawiki-config</code>
***Config changes (including enabling/disabling features)
***Config changes (including enabling/disabling features, refactoring, etc)
***If it's complicated or risky, please request a special window or to synchronize with the [[Deployments/One_week|weekly train]].
***Some changes may require their own [[Deployments/Inclusion_criteria|Deployment Window]].


*'''Forbidden types of patches'''
*'''Forbidden types of patches'''
Line 37: Line 38:
##Commit the fix in master first
##Commit the fix in master first
##Test that the issue is truly fixed on the [[mw:Beta Cluster]] (if possible)
##Test that the issue is truly fixed on the [[mw:Beta Cluster]] (if possible)
##[https://tools.wmflabs.org/versions/ Identify what branch is rolling out] (the leftmost branch).
##[https://versions.toolforge.org/ Identify what branch is rolling out] (the leftmost branch).
##Prepare patches in Gerrit against the current live branches (named wmf.''NN'') (or a subset if the bug is limited).  
##Prepare patches in Gerrit against the current live branches (named wmf.''NN'') (or a subset if the bug is limited).  
###For example, if the branch is <code>1.35.0-wmf.11</code> and the patch's Git hash is <code>ec56a606</code>, execute <code>git fetch origin wmf/1.35.0-wmf.11 && git checkout wmf/1.35.0-wmf.11 && git cherry-pick ec56a606</code>. The latter cherry-pick command can also be found in the Gerrit old UI under the download menu. E.g., <code>git fetch "https://gerrit.wikimedia.org/r/mediawiki/skins/MinervaNeue" refs/changes/23/559223/1 && git cherry-pick FETCH_HEAD</code>.
###For example, if the branch is <code>1.35.0-wmf.11</code> and the patch's Git hash is <code>ec56a606</code>, execute <code>git fetch origin wmf/1.35.0-wmf.11 && git checkout wmf/1.35.0-wmf.11 && git cherry-pick ec56a606</code>. The latter cherry-pick command can also be found in the Gerrit old UI under the download menu. E.g., <code>git fetch "https://gerrit.wikimedia.org/r/mediawiki/skins/MinervaNeue" refs/changes/23/559223/1 && git cherry-pick FETCH_HEAD</code>.
Line 45: Line 46:
# If it is a config change
# If it is a config change
##Prepare patches in Gerrit against the master branch of the mediawiki-config repo
##Prepare patches in Gerrit against the master branch of the mediawiki-config repo
#Add the gerrit URL and your IRC name to [[Deployments]] calendar page in the correct ̈'''backport''' deploy slot. Editing this page is error-prone. If you have JavaScript enabled, it'll at least tell you how long the wait it is. Double check the time, that the slot says "backport", your IRC handle, and the patch number.
#Add the gerrit URL and your IRC name to [[Deployments]] calendar page in the correct '''backport deploy''' slot. Editing this page is error-prone. If you have JavaScript enabled, it'll at least tell you how long the wait it is. Double check the time, that the slot says "backport", your IRC handle, and the patch number.
#Double check that your [https://phabricator.wikimedia.org/phame/live/7/post/183/wikimediadebug_v2_is_here/ WikimediaDebug] extension is installed. You'll need this to verify the change after its been deployed.
#Double check that your [https://techblog.wikimedia.org/2019/12/16/wikimediadebug-v2-is-here/ WikimediaDebug] extension is installed. You'll need this to verify the change after its been deployed.
#Be sure that the person whose name appears on the [[Deployments]] calendar page will be present on the {{irc|wikimedia-operations}} IRC channel for the deployment and able to test the patch, especially if it is a different person from the author of the patch. Now is a good time to double check your IRC nick and connection. If you've done everything right, jouncebot will ping you when your backport window opens. This is a good time to ping the scheduled deployers directly.
#Be sure that the person whose name appears on the [[Deployments]] calendar page will be present on the {{irc|wikimedia-operations}} IRC channel for the deployment and able to test the patch, especially if it is a different person from the author of the patch. Now is a good time to double check your IRC nick and connection. If you've done everything right, jouncebot will ping you when your backport window opens. This is a good time to ping the scheduled deployers directly.


Line 70: Line 71:
#The backport team member reviews the patches and picks the ordering.
#The backport team member reviews the patches and picks the ordering.
#The backport team member identifies the patch to merge, asks if the submitter is ready to test, and merges the backport(s)
#The backport team member identifies the patch to merge, asks if the submitter is ready to test, and merges the backport(s)
# After merge, the backport team member fetches the patch(es) to <code>deploy1001.eqiad.wmnet</code> and then runs <code>scap pull</code> on a [[Debug servers|mwdebug host]] (typically <code>mwdebug#002</code>)
# After merge, the backport team member fetches the patch(es) to <code>deploy1002.eqiad.wmnet</code> and then runs <code>scap pull</code> on a [[Debug servers|mwdebug host]] (typically <code>mwdebug#002</code>)
#The submitter tests the change by using the instructions at [[X-Wikimedia-Debug#Staging_changes]] AND the backport team member checks the error logs
#The submitter tests the change by using the instructions at [[X-Wikimedia-Debug#Staging_changes]] AND the backport team member checks the error logs
#If there are no errors and the fix seems to work (if testable in that manner), then the backport team member deploys the patch to the entire fleet
#If there are no errors and the fix seems to work (if testable in that manner), then the backport team member deploys the patch to the entire fleet

Revision as of 18:26, 20 May 2021

Putting the prod in production. Production Drive Committee - NARA - 534919.jpg
Deployments

Backport windows are set times for Wikimedia production deployments, are done by a member(s) of the backport deploy team (see below). They were historically known as SWAT deploy windows. Despite the name, they are used for configuration changes to production wikis as well as backports.

The purpose is to provide a known window for people to get bug fixes deployed ahead of the normal cadence (currently, weekly) without having to beg people to do the deployment for them; the team is there to do the deployment part. Similarly, it is usually advantageous to make configuration changes separate from train deploys so that effects of the reconfiguration can be distinguished from regressions caused by the train; the backport deploy team provides the expertise to safely deploy configuration changes.

The words MUST, CAN, MAY, etc. are to be interpreted in conformity with RFC:2119.

Guidelines

  • There will be at least one backport window deployer available and active during the window.
    • If no backport team member is around and available to do the deploys the window will be skipped. Please reschedule your patches.
  • All those submitting patches for deployment MUST be in #wikimedia-operations connect on libera.chat to communicate with the backport team member.
    • The backport team will ping the relevant developers at the start of the window and when theirs is up; they MUST be available. If they are not available the patch will NOT be deployed.
  • All communication MUST happen in #wikimedia-operations connect on libera.chat (not in separate team or area-specific channels)
  • Allowed types of patches Things not fitting these criteria should instead use the standard deploy window process
    • For mediawiki/core and mediawiki/extensions:
      • Fixes of regressions
      • Backports only (IOW: Everything should already be committed into master and tested on mw:Beta Cluster and then "backported" to the relevant release branch)
    • For operations/mediawiki-config
      • Config changes (including enabling/disabling features, refactoring, etc)
      • Some changes may require their own Deployment Window.
  • Forbidden types of patches
    • Single patches that require more than one sync - in other words, changes to multiple files which depend on each other.
      • Instead, please break up the patches into multiple safe patches that can be deployed by themselves. See: task T187761
    • No new extensions
    • Nothing that still needs prior public communication with affected wikis (this is subjective at times, and the backport team reserves the right to not deploy if they feel uncomfortable)
  • The backport team may ask questions regarding the patches to understand the implications and assess risk. The relevant developers should ideally be on IRC in the hour prior to the backport window.
  • The backport team MUST be comfortable with the patch going out and CAN veto any proposed patch they are not comfortable with for ANY reason.
  • Our windows have a limit of 6 patches.
    • NOTE: Cherry-picking a patch to both release branches counts as 2 as they will be separate deployments.

How to submit a patch for backport

  1. If it is a backport to a branch
    1. Commit the fix in master first
    2. Test that the issue is truly fixed on the mw:Beta Cluster (if possible)
    3. Identify what branch is rolling out (the leftmost branch).
    4. Prepare patches in Gerrit against the current live branches (named wmf.NN) (or a subset if the bug is limited).
      1. For example, if the branch is 1.35.0-wmf.11 and the patch's Git hash is ec56a606, execute git fetch origin wmf/1.35.0-wmf.11 && git checkout wmf/1.35.0-wmf.11 && git cherry-pick ec56a606. The latter cherry-pick command can also be found in the Gerrit old UI under the download menu. E.g., git fetch "https://gerrit.wikimedia.org/r/mediawiki/skins/MinervaNeue" refs/changes/23/559223/1 && git cherry-pick FETCH_HEAD.
      2. Now push the patch up. Assuming the branch is 1.35.0-wmf.11, git push origin HEAD:refs/for/wmf/1.35.0-wmf.11. If Git wants a password, it's your HTTP password found in the Gerrit settings.
      3. Depending on the patch, positive reviews beforehand are necessary (the backport team is not responsible for code review)
      4. Note the number in the Gerrit URL. E.g., in `https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/559226/`, the number is 559226. This number will be used to fill out the backport template below.
  2. If it is a config change
    1. Prepare patches in Gerrit against the master branch of the mediawiki-config repo
  3. Add the gerrit URL and your IRC name to Deployments calendar page in the correct backport deploy slot. Editing this page is error-prone. If you have JavaScript enabled, it'll at least tell you how long the wait it is. Double check the time, that the slot says "backport", your IRC handle, and the patch number.
  4. Double check that your WikimediaDebug extension is installed. You'll need this to verify the change after its been deployed.
  5. Be sure that the person whose name appears on the Deployments calendar page will be present on the #wikimedia-operations connect IRC channel for the deployment and able to test the patch, especially if it is a different person from the author of the patch. Now is a good time to double check your IRC nick and connection. If you've done everything right, jouncebot will ping you when your backport window opens. This is a good time to ping the scheduled deployers directly.
Example entry:
{{ircnick|legoktm}}
* [wmf.8] {{gerrit|297697}} Make LocalRename jobs run sequentially
* [wmf.9] {{gerrit|297698}} Make LocalRename jobs run sequentially
* [config] {{gerrit|431759}} Remove unused PopupsAnonsExperimentalGroupSize config variable

Doing the deploy

Generally:

  • The backport team coordinates the merging and deploying of the patches. The order to deploy the patches is decided by them.
  • The relevant developers should have their test cases ready to run as soon as their patches are deployed.

The process:

  1. The deploy IRC bot (jouncebot) will announce the start of the window and ping the backport team along with anyone who has submitted a patch to deploy.
  2. The backport team member reviews the patches and picks the ordering.
  3. The backport team member identifies the patch to merge, asks if the submitter is ready to test, and merges the backport(s)
  4. After merge, the backport team member fetches the patch(es) to deploy1002.eqiad.wmnet and then runs scap pull on a mwdebug host (typically mwdebug#002)
  5. The submitter tests the change by using the instructions at X-Wikimedia-Debug#Staging_changes AND the backport team member checks the error logs
  6. If there are no errors and the fix seems to work (if testable in that manner), then the backport team member deploys the patch to the entire fleet
  7. The submitter tests again (without X-Wikimedia-Debug) AND the backport team member checks the logs again.
  8. If everything is good, the next patch is selected and the process starts again.

Backport Team members' roles, responsibilities, and tips

Trust

  • Being a member of the backport team imparts a large amount of trust on the person. In some ways more trust that simply access to deploying on the Wikimedia cluster as others are encouraged to ask you to deploy things on their behalf and you must be willing to say "No" when you are uncomfortable. Making mistakes is to be human, but not learning from them will cause backport deployers to lose their deployment access.

Knowledge

  • Backport deployers need not be experts in all parts of our infrastructure, but they must be comfortable with assessing the general risk of a given patch. If needed, they should ask probing questions to the developer submitting the patch to learn more.
    • Experience with MediaWiki and MediaWiki config changes a plus as that is the vast majority of changes that come through the backport process.
  • Some unintuitive situations include:
    • a "simple" config change causing a load spike in a dependent system that the deployer or developer is not familiar with
    • a "simple" config change being against "the community's", the Wikimedia Foundation's, or both's desires
      • controversial changes can easily be skipped and referred to the WMF Release Manager for next steps, there is no need to rush these
  • If a backport deployer is uncomfortable with a certain area of the code-base they are free to skip that backport at their own discretion (or have another backport deployer review and/or deploy it).

Decisiveness

  • Backport deployers should not feel obligated to help a developer debug a situation, especially if there is a user-facing issue or outage.
  • When in doubt: Revert and ask questions later

New Backport Team member check-list

  • Read and be comfortable with the above roles, responsibilities and tips
  • Shell and deploy access in production, see
  • Access to merge changes in wmf deploy branches by being added to the wmf-deployments gerrit group
    • Ask any existing wmf-deployments group member to do this.
  • Join (and read) the operations mailing list (ops@lists.wikimedia.org)
    • This is because announcements that could impact how and/or when to deploy things are primarily sent there.
  • Read the docs: