You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Heterogeneous deployment/Train deploys: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Zfilipin
(→‎Incident documentation: Use Create report form to create a new page)
imported>Dduvall
(→‎Places to Watch for Breakage: Added link to Frontend Responses NGINX vs Varnish dashboard)
Line 43: Line 43:
** [https://logstash.wikimedia.org/app/kibana#/dashboard/group1 group1]
** [https://logstash.wikimedia.org/app/kibana#/dashboard/group1 group1]
* [https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?refresh=5m&orgId=1 Grafana Varnish error-rate dashboard] (HTTP 5XX % should have 3+ 0s after the decimal point, e.g. 0.0001%)
* [https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?refresh=5m&orgId=1 Grafana Varnish error-rate dashboard] (HTTP 5XX % should have 3+ 0s after the decimal point, e.g. 0.0001%)
* [https://grafana.wikimedia.org/d/000000612/frontend-responses-nginx-vs-varnish?orgId=1&from=now-15m&to=now Grafana Frontend Responses NGINX vs Varnish]


=== If the train is blocked ===
=== If the train is blocked ===

Revision as of 20:38, 9 January 2019

Bring new code in fast, safe and efficient way!
Deployments

Breakage

There will be times when this process does not go smoothly. There are guidelines for what do to when that happens.

In general, if there is an unexplained error that occurs within 1 hour of a train deployment — always roll back the train. Rolling back the train to eliminate it as the cause of unexplained breakage can be especially important if there are many ongoing possible causes for issues as this helps to eliminate one of those causes as the source of problems.

Rollback

To rollback a wikiversion change, it should be pretty quick. Go ahead and rollback production before you send patches up to gerrit since waiting on Jenkins may take a while:

you@deploy1001:/srv/mediawiki-staging$ git revert $(git log -1 --format=%H -- wikiversions.json)
you@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions 'Revert "group[0|1] wikis to [VERSION]"'
you@deploy1001:/srv/mediawiki-staging$ # Now that you've synced the revert, push patches up to gerrit, you have to run git commit --amend to get the changeid
you@deploy1001:/srv/mediawiki-staging$ git commit --amend
you@deploy1001:/srv/mediawiki-staging$ git push origin HEAD:refs/for/master/[VERSION]

Example:

you@deploy1001:/srv/mediawiki-staging$ git push origin HEAD:refs/for/master/1.33.0-wmf.0
  • Review and +2 the patch in Gerrit.

Places to Watch for Breakage

Train deployers should check for breakage as they are rolling out train as they are effectively the first line of defense for train deploys. Some of the places to watch for breakage:

If the train is blocked

  • A task will be assigned to you, for example T191059 (1.32.0-wmf.13 deployment blockers)
  • Any open subtasks block the train from moving forward. This means no further deployments until the blockers are resolved.

Checklist

If there are blocking tasks, please do the following:

  • Make sure all tasks blocking train are set to UBN! priority in phabricator
  • Comment on the task asking for an ETA or if this can be solved by reverting a recent commit.
  • Send e-mail to:
    • engineering@lists.wikimedia.org
    • ops@lists.wikimedia.org
    • wikitech-ambassadors@lists.wikimedia.org
    • wikitech-l@lists.wikimedia.org
    • Subject: [Train] {version} status update
    • Body
      The {version} version of MediaWiki is blocked[0].
      
      The new version is deployed to {group(s){0,1,2}}[1], but can proceed no
      further until these issues are resolved:
      
      * {Phab task name} - {phab task link}
      
      Once these issues are resolved train can resume. If these issues are
      resolved on a Friday the train will resume Monday.
      
      Thank you for your help resolving these issues!
      
      -- Your humble train toiler
      
      [0]. <{link to phab task for train}>
      [1]. <https://tools.wmflabs.org/versions/>
      
  • Add relevant people (see Developers/Maintainers) to the blocking task
  • Ping relevant people in IRC
  • Once train is unblocked be sure to thank the folks who helped unblock it

Tuesday: New branch creation and deploy

The new branch can be created in Gerrit from anywhere. It is often faster to do this step on a host in the cluster to minimize the time needed to clone from Gerrit.

Before the deploy window

Depending on how practiced you are and where you choose to run commands (full clones of mediawiki-core from outside the cluster can take a while) the steps will typically take 45 to 90 minutes.

Setup

The script is run as your regular user member of the wikidev group (as of Feb 16th 2016).

Configure Git.

you@laptop:~$ ssh deployment.eqiad.wmnet

you@deploy1001:~$ git config --global user.name "[FIRST-NAME] [LAST-NAME]"
you@deploy1001:~$ git config --global user.email "[USERNAME]@[DOMAIN]"

Create a .netrc file in your home directory with the following content.

you@deploy1001:~$ vim .netrc
machine gerrit.wikimedia.org login [USERNAME] password [PASSWORD]

Username and password can obtained from Gerrit:

  • In the new UI go to HTTP Credentials, copy Username and click Generate new password to generate new password.
  • In the old UI, go to HTTP Password, copy Username and click Generate Password to generate new password.

Generated password in both cases is different from your Gerrit password.

Make sure .netrc file is only readable by you.

you@deploy1001:~$ chmod go-rwx .netrc

Clone or update mediawiki/tools/release.

you@deploy1001:~$ git clone https://gerrit.wikimedia.org/r/mediawiki/tools/release

tmux or screen

Some scripts run for 10-60 minutes so consider using tmux or screen.

If you prefer tmux:

you@deploy1001:~$ tmux new -s train
...
you@deploy1001:~$ exit

If you need to leave in the middle you can do ctrl-b d to detach and tmux a -t train to attach.

If you prefer screen:

you@deploy1001:~$ screen -D -RR train
...
you@deploy1001:~$ exit

If you need to leave in the middle you can do ctrl-a d to detach and screen -r train to attach.

Create the new branch in Gerrit

you@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n [VERSION] -o master

Example:

you@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n 1.33.0-wmf.0 -o master

🐌 Note: the script will run for about 15 minutes.

Clone new branch

This command will create a new /srv/mediawiki-staging/php-[VERSION] directory:

you@deploy1001:/srv/mediawiki-staging$ scap prep [VERSION]

Example:

you@deploy1001:/srv/mediawiki-staging$ scap prep 1.33.0-wmf.0

Apply security patches

  • Patches should be named sequentially in the order that they will cleanly apply (e.g. 01-T[NUMBER].patch, 02-T[NUMBER].patch)
  • Check and apply each patch in both /srv/patches/[VERSION]/core and /srv/patches/[VERSION]/extensions/[NAME] to the new core checkout and extensions, respectively.

Check existing patches:

you@deploy1001:~$ tree /srv/patches/[VERSION]
/srv/patches/[VERSION]
├── core
│   ├── 01-T[NUMBER].patch
│   └── 02-T[NUMBER].patch
└── extensions
    └── [EXTENSION]
  • You can check a core patch to see if it will apply cleanly with
you@deploy1001:/srv/mediawiki-staging/php-[VERSION]$ git apply --check --3way /srv/patches/[VERSION]/core/[NUMBER]-T[NUMBER].patch
  • If the patch checks out, apply and commit it with
you@deploy1001:/srv/mediawiki-staging/php-[VERSION]$ git am --3way /srv/patches/[VERSION]/core/[NUMBER]-T[NUMBER].patch
  • If the patch fails to apply, investigate whether it's due to a conflict (git status) or the patch having been merged since the new branch cut (search git log for the commit, etc.). If it turns out to be the latter, remove the patch file from the /srv/patches/[VERSION] directory.
  • If you need extra help, contact Security Team (Wikimedia Foundation, MediaWiki, Office Wiki), currently Brian (bawolff) and Sam (Reedy) in IRC.

Create patches to update wikiversions.json

Create group0 to [VERSION] patch:

you@deploy1001:/srv/mediawiki-staging/$ scap update-wikiversions group0 [VERSION]
you@deploy1001:/srv/mediawiki-staging/$ git add wikiversions.json
you@deploy1001:/srv/mediawiki-staging/$ git commit -m "Group0 to [VERSION]"

Example:

you@deploy1001:/srv/mediawiki-staging/$ scap update-wikiversions group0 1.33.0-wmf.0
you@deploy1001:/srv/mediawiki-staging/$ git add wikiversions.json
you@deploy1001:/srv/mediawiki-staging/$ git commit -m "Group0 to 1.33.0-wmf.0"

Send staged patches to Gerrit for review

you@deploy1001:/srv/mediawiki-staging/$ git push origin HEAD:refs/for/master/[VERSION]

Example:

you@deploy1001:/srv/mediawiki-staging/$ git push origin HEAD:refs/for/master/1.33.0-wmf.0

Discard changes to working directory and index

you@deploy1001:/srv/mediawiki-staging/$ git reset --hard origin/master

Clean up old stuff

mw:MediaWiki 1.33/Roadmap is a good place to find when a branch was created.

List all branches:

you@deploy1001:/srv/mediawiki-staging/$ find . -maxdepth 1 -type d -name 'php-*' -print

Find old branches, more than 30 days old:

you@deploy1001:/srv/mediawiki-staging/$ find . -mindepth 2 -maxdepth 2 -type f -path './php-*/README' -ctime +30 -exec dirname {} \;

For all branches more than 30 days old, drop everything.

you@deploy1001:/srv/mediawiki-staging/$ scap clean --delete [VERSION]

Example:

you@deploy1001:/srv/mediawiki-staging/$ scap clean --delete 1.33.0-wmf.0

For all branches older than the currently active branch(es) and prior one, prune everything that's not a static asset (we need those for cached CSS/JS/etc). Active branches are visible at Wikimedia MediaWiki versions page.

you@deploy1001:/srv/mediawiki-staging/$ scap clean [VERSION]

Example:

you@deploy1001:/srv/mediawiki-staging/$ scap clean 1.33.0-wmf.0

Sync to cluster and verify on testwiki

  • Edit /srv/mediawiki-staging/wikiversions.json and set testwiki to php-[VERSION]
  • Do not commit and push to Gerrit, only make this change locally on the deployment server
you@deploy1001:/srv/mediawiki-staging/$ vim wikiversions.json
  • Run scap to (re)build localization caches and sync changes across the cluster.
  • 🐌 Note: this step will for about 20 minutes.
you@deploy1001:/srv/mediawiki-staging/$ scap sync "testwiki to php-[VERSION] and rebuild l10n cache"

Example:

you@deploy1001:/srv/mediawiki-staging/$ scap sync "testwiki to php-1.33.0-wmf.0 and rebuild l10n cache"
  • Revert local changes
you@deploy1001:/srv/mediawiki-staging/$ git checkout -- wikiversions.json

Update deploy notes

  • Create deploy notes
you@deploy1001:~$ ./release/make-deploy-notes/makedeploynotes.py [PREVIOUS-VERSION] [VERSION] | tee deploy-notes-[VERSION]

Example:

you@deploy1001:~$ ./release/make-deploy-notes/makedeploynotes.py 1.33.0-wmf.0 1.33.0-wmf.1 | tee deploy-notes-1.33.0-wmf.1

Wait for deploy window

All of the changes above can be done at any time prior to the actual deployment window.

During the deploy window

Switch group0 wikis to [VERSION]

  • Review and submit group0 to [VERSION] patch in Gerrit
  • Wait for Gerrit/Zuul/Jenkins to merge the patch(es)
  • Pull patch(es) to deployment server
you@deploy1001:/srv/mediawiki-staging$ git fetch
  • Check diff to ensure it is what you expect
you@deploy1001:/srv/mediawiki-staging$ git diff HEAD..origin/master
  • Apply changes
you@deploy1001:/srv/mediawiki-staging$ git rebase origin/master
  • Sync the change across the cluster
you@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions "group0 to [VERSION]"

Example:

you@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions "group0 to 1.33.0-wmf.0"

Update roadmap

  • Change the Deployed to group (if you're using VisualEditor) or the 3rd parameter of the WMFReleaseTableRow template (if you're using the wikitext editor) to 0 (deployed to group0) at mw:MediaWiki 1.33/Roadmap.

For wikitext editor, change

{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|[VERSION]|[DATE]|}}
...
{{WMFReleaseTableFooter}}

to

{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|[VERSION]|[DATE]|0}}
...
{{WMFReleaseTableFooter}}

Example:

{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|12|2018-07-10|0}}
...
{{WMFReleaseTableFooter}}

Wednesday: group0 to group1 deploy

Switch group1 wikis to [VERSION]

Use the release/bin/deploy-promote script to update wikiversions.json

you@deploy1001:~$ ./release/bin/deploy-promote
Promote group1 from [PREVIOUS-VERSION] to [VERSION] [y/N]

The script automatically Code-Review +2 the patch in Gerrit. Once CI has merged it hit enter at the 2nd prompt

Now wait for jenkins to merge the patch, then press enter to continue with git pull && scap sync-wikiversions

After the script run is complete, group1 wikis should be running [VERSION].

Update roadmap

  • Change the Deployed to group (if you're using VisualEditor) or the 3rd parameter of the WMFReleaseTableRow template (if you're using the wikitext editor) to 1 (deployed to group1) at mw:MediaWiki 1.33/Roadmap.

For wikitext editor, change

{{WMFReleaseTableRow|[VERSION]|[DATE]|0}}

to

{{WMFReleaseTableRow|[VERSION]|[DATE]|1}}

Example:

{{WMFReleaseTableRow|12|2018-07-10|1}}

Thursday: group{0,1} to all deploy

Switch all wikis to [VERSION]

Thursday deploy is very similar to the Wednesday deploy, the only difference in terms of procedure is the target group

Use the release/bin/deploy-promote all script to update wikiversions.json

you@deploy1001:~$ ./release/bin/deploy-promote all
Promote all from [PREVIOUS-VERSION] to [VERSION] [y/N]

The script automatically Code-Review +2 the patch in Gerrit. Once CI has merged it hit enter at the 2nd prompt

Now wait for jenkins to merge the patch, then press enter to continue with git pull && scap sync-wikiversions

After the script run is complete, all wikis should be running [VERSION].

Update roadmap

  • Change the Deployed to group (if you're using VisualEditor) or the 3rd parameter of the WMFReleaseTableRow template (if you're using the wikitext editor) to 2 (deployed to all wikis) at mw:MediaWiki 1.33/Roadmap.

For wikitext editor, change

{{WMFReleaseTableRow|[VERSION]|[DATE]|1}}

to

{{WMFReleaseTableRow|[VERSION]|[DATE]|2}}

Example:

{{WMFReleaseTableRow|12|2018-07-10|2}}

Incident documentation