You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Heterogeneous deployment/Train deploys: Difference between revisions
imported>Thcipriani (→If the train is blocked: no such list!) |
imported>Zfilipin (→Apply security patches: for extension) |
||
Line 1: | Line 1: | ||
[[File: | [[File:MTR_CSR_Sifang_EMU_in_Shek_Kong_Stabling_Sidings_201710.jpg|thumb|500x500px|Bring new code in a fast, safe and efficient way!]] | ||
{{Navigation MediaWiki deployment}} | {{Navigation MediaWiki deployment}} | ||
== Breakage == | == Breakage== | ||
There will be times when this process does not go smoothly. There are [[Deployments/Holding_the_train|guidelines]] for what do to when that happens. | There will be times when this process does not go smoothly. There are [[Deployments/Holding_the_train|guidelines]] for what do to when that happens. | ||
Line 7: | Line 7: | ||
In general, '''if there is an unexplained error that occurs within 1 hour of a train deployment — always roll back the train'''. Rolling back the train to eliminate it as the cause of unexplained breakage can be especially important if there are many ongoing possible causes for issues as this helps to eliminate one of those causes as the source of problems. | In general, '''if there is an unexplained error that occurs within 1 hour of a train deployment — always roll back the train'''. Rolling back the train to eliminate it as the cause of unexplained breakage can be especially important if there are many ongoing possible causes for issues as this helps to eliminate one of those causes as the source of problems. | ||
=== Rollback === | ===Rollback=== | ||
To rollback a wikiversion change, it should be pretty quick. Go ahead and rollback production before you send patches up to gerrit since waiting on Jenkins may take a while: | To rollback a wikiversion change, it should be pretty quick. Go ahead and rollback production before you send patches up to gerrit since waiting on Jenkins may take a while: | ||
Line 25: | Line 25: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Wait for the patch to merge and the fetch back down to the deployment server | *Wait for the patch to merge and the fetch back down to the deployment server | ||
* [[#Update roadmap]]. | *[[#Update roadmap]]. | ||
=== Places to Watch for Breakage === | === Places to Watch for Breakage=== | ||
Train deployers should check for breakage as they are rolling out train as they are effectively the first line of defense for train deploys. Some of the places to watch for breakage: | Train deployers should check for breakage as they are rolling out train as they are effectively the first line of defense for train deploys. Some of the places to watch for breakage: | ||
* IRC | *IRC | ||
** primary channel is {{irc|wikimedia-operations}} | **primary channel is {{irc|wikimedia-operations}} | ||
** useful channels are {{irc|mediawiki-core}} {{irc|wikimedia-dev}} | **useful channels are {{irc|mediawiki-core}} {{irc|wikimedia-dev}} | ||
** for more channels see [https://www.mediawiki.org/wiki/MediaWiki_on_IRC MediaWiki on IRC] and [https://meta.wikimedia.org/wiki/IRC/Channels IRC/Channels] | **for more channels see [https://www.mediawiki.org/wiki/MediaWiki_on_IRC MediaWiki on IRC] and [https://meta.wikimedia.org/wiki/IRC/Channels IRC/Channels] | ||
* [https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor Logstash Fatal Monitor] | * [https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor Logstash Fatal Monitor] | ||
* [https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors Logstash MediaWiki Errors] | *[https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors Logstash MediaWiki Errors] | ||
* Logstash "mediawiki-new-errors" dashboard (linked from logstash front page | *Logstash "mediawiki-new-errors" dashboard (linked from logstash front page) | ||
**[https://logstash.wikimedia.org/app/kibana#/dashboard/dfcf7b70-1aaa-11e9-b4bc-db12fe15ab31 Showing only timeout errors] (see T204871) | **[https://logstash.wikimedia.org/app/kibana#/dashboard/dfcf7b70-1aaa-11e9-b4bc-db12fe15ab31 Showing only timeout errors] (see T204871) | ||
* Group-specific Logstash Dashboards: | * Group-specific Logstash Dashboards: | ||
** [https://logstash.wikimedia.org/app/kibana#/dashboard/group0 group0] | **[https://logstash.wikimedia.org/app/kibana#/dashboard/group0 group0] | ||
** [https://logstash.wikimedia.org/app/kibana#/dashboard/group1 group1] | **[https://logstash.wikimedia.org/app/kibana#/dashboard/group1 group1] | ||
* [https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?refresh=5m&orgId=1 Grafana Varnish error-rate dashboard] (HTTP 5XX % should have 3+ 0s after the decimal point, e.g. 0.0001%) | *[https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?refresh=5m&orgId=1 Grafana Varnish error-rate dashboard] (HTTP 5XX % should have 3+ 0s after the decimal point, e.g. 0.0001%) | ||
* [https://grafana.wikimedia.org/d/000000612/frontend-responses-nginx-vs-varnish?orgId=1&from=now-15m&to=now Grafana Frontend Responses NGINX vs Varnish] | * [https://grafana.wikimedia.org/d/000000612/frontend-responses-nginx-vs-varnish?orgId=1&from=now-15m&to=now Grafana Frontend Responses NGINX vs Varnish] | ||
* [https://grafana.wikimedia.org/d/000000102/production-logging Grafana Production Logging] | *[https://grafana.wikimedia.org/d/000000102/production-logging Grafana Production Logging] | ||
* [https://grafana.wikimedia.org/d/000000566/overview?panelId=15&fullscreen&orgId=1&from=now-7d&to=now Minerva Client Errors] - Browser JS errors count (only wikipedias on mobile) | *[https://grafana.wikimedia.org/d/000000566/overview?panelId=15&fullscreen&orgId=1&from=now-7d&to=now Minerva Client Errors] - Browser JS errors count (only wikipedias on mobile) | ||
=== If the train is blocked === | === If the train is blocked=== | ||
* A task will be assigned to you, for example [https://phabricator.wikimedia.org/T191059 T191059] (1.32.0-wmf.13 deployment blockers) | *A task will be assigned to you, for example [https://phabricator.wikimedia.org/T191059 T191059] (1.32.0-wmf.13 deployment blockers) | ||
* Any open subtasks block the train from moving forward. This means no further deployments until the blockers are resolved. | *Any open subtasks block the train from moving forward. This means no further deployments until the blockers are resolved. | ||
'''Checklist''' | '''Checklist''' | ||
Line 59: | Line 58: | ||
If there are blocking tasks, please do the following: | If there are blocking tasks, please do the following: | ||
* Make sure all tasks blocking train are set to <code>UBN!</code> priority in phabricator | *Make sure all tasks blocking train are set to <code>UBN!</code> priority in phabricator | ||
* Comment on the task asking for an ETA or if this can be solved by reverting a recent commit. | *Comment on the task asking for an ETA or if this can be solved by reverting a recent commit. | ||
* Send e-mail to: | *Send e-mail to: | ||
** [https://lists.wikimedia.org/mailman/listinfo/ops ops@lists.wikimedia.org] | **[https://lists.wikimedia.org/mailman/listinfo/ops ops@lists.wikimedia.org] | ||
** [https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors wikitech-ambassadors@lists.wikimedia.org] | **[https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors wikitech-ambassadors@lists.wikimedia.org] | ||
** [https://lists.wikimedia.org/mailman/listinfo/wikitech-l wikitech-l@lists.wikimedia.org] | **[https://lists.wikimedia.org/mailman/listinfo/wikitech-l wikitech-l@lists.wikimedia.org] | ||
** Subject: <code>[Train] {version} status update</code> | **Subject: <code>[Train] {version} status update</code> | ||
** Body<syntaxhighlight lang="text">The {version} version of MediaWiki is blocked[0]. | **Body<syntaxhighlight lang="text">The {version} version of MediaWiki is blocked[0]. | ||
The new version is deployed to {group(s){0,1,2}}[1], but can proceed no | The new version is deployed to {group(s){0,1,2}}[1], but can proceed no | ||
Line 82: | Line 81: | ||
[0]. <{link to phab task for train}> | [0]. <{link to phab task for train}> | ||
[1]. <https://tools.wmflabs.org/versions/></syntaxhighlight> | [1]. <https://tools.wmflabs.org/versions/></syntaxhighlight> | ||
* Add relevant people (see [https://www.mediawiki.org/wiki/Developers/Maintainers Developers/Maintainers]) to the blocking task | *Add relevant people (see [https://www.mediawiki.org/wiki/Developers/Maintainers Developers/Maintainers]) to the blocking task | ||
* Ping relevant people in IRC | * Ping relevant people in IRC | ||
* Once train is unblocked be sure to thank the folks who helped unblock it | * Once train is unblocked be sure to thank the folks who helped unblock it | ||
== Tuesday: New branch creation and deploy == | ==Tuesday: New branch creation and deploy== | ||
The new branch can be created in Gerrit from anywhere. It is often faster to do this step on a host in the cluster to minimize the time needed to clone from Gerrit. | The new branch can be created in Gerrit from anywhere. It is often faster to do this step on a host in the cluster to minimize the time needed to clone from Gerrit. | ||
=== Before the deploy window === | ===Before the deploy window=== | ||
Depending on how practiced you are and where you choose to run commands (full clones of mediawiki-core from outside the cluster can take a while) the steps will typically take 45 to 90 minutes. | Depending on how practiced you are and where you choose to run commands (full clones of mediawiki-core from outside the cluster can take a while) the steps will typically take 45 to 90 minutes. | ||
==== Setup ==== | ====Setup==== | ||
The script is run as your regular user member of the <code>wikidev</code> group (as of Feb 16th 2016). | The script is run as your regular user member of the <code>wikidev</code> group (as of Feb 16th 2016). | ||
Line 120: | Line 119: | ||
Add your new <code>~/.ssh/id_ed25519.pub</code> key to Gerrit: | Add your new <code>~/.ssh/id_ed25519.pub</code> key to Gerrit: | ||
* In the new UI go to [https://gerrit.wikimedia.org/r/settings/#SSHKeys SSH keys], copy the contents of your <code>~/.ssh/id_ed25519.pub</code> file into the ui. | |||
* In the old UI, go to [https://gerrit.wikimedia.org/r/settings/ssh-keys#SSHKeys SSH keys], copy the contents of your <code>~/.ssh/id_ed25519.pub</code> file into the ui. | *In the new UI go to [https://gerrit.wikimedia.org/r/settings/#SSHKeys SSH keys], copy the contents of your <code>~/.ssh/id_ed25519.pub</code> file into the ui. | ||
*In the old UI, go to [https://gerrit.wikimedia.org/r/settings/ssh-keys#SSHKeys SSH keys], copy the contents of your <code>~/.ssh/id_ed25519.pub</code> file into the ui. | |||
Create a new ssh-agent so that your key remains unlocked during branch cutting: | Create a new ssh-agent so that your key remains unlocked during branch cutting: | ||
Line 165: | Line 165: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== tmux or screen ==== | ====tmux or screen==== | ||
Some scripts run for 10-60 minutes so consider using tmux or screen. | Some scripts run for 10-60 minutes so consider using tmux or screen. | ||
Line 189: | Line 189: | ||
If you need to leave in the middle you can do <code>ctrl-a d</code> to detach and <code>screen -r train</code> to attach. | If you need to leave in the middle you can do <code>ctrl-a d</code> to detach and <code>screen -r train</code> to attach. | ||
==== Create the new branch in Gerrit ==== | ====Create the new branch in Gerrit==== | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 203: | Line 203: | ||
🐌 Note: the script will run for about 30 minutes. | 🐌 Note: the script will run for about 30 minutes. | ||
==== Clone new branch ==== | If the process is interrupted (by rebooting Gerrit, for example), continue from where it stopped: | ||
<syntaxhighlight lang="shell-session"> | |||
USERNAME@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n [VERSION] -o master -c extensions/[EXTENSION] | |||
</syntaxhighlight> | |||
====Clone new branch==== | |||
This command will create a new <code>/srv/mediawiki-staging/php-[VERSION]</code> directory: | This command will create a new <code>/srv/mediawiki-staging/php-[VERSION]</code> directory: | ||
Line 218: | Line 224: | ||
This should only take a couple of minutes. | This should only take a couple of minutes. | ||
==== Apply security patches ==== | ====Apply security patches==== | ||
* Patches should be named sequentially in the order that they will cleanly apply (e.g. <code>01-T[NUMBER].patch</code>, <code>02-T[NUMBER].patch</code>) | |||
* Check and apply each patch in both <code>/srv/patches/[VERSION]/core</code> and <code>/srv/patches/[VERSION]/extensions/[NAME]</code> to the new core checkout and extensions, respectively. | *Patches should be named sequentially in the order that they will cleanly apply (e.g. <code>01-T[NUMBER].patch</code>, <code>02-T[NUMBER].patch</code>) | ||
*Check and apply each patch in both <code>/srv/patches/[VERSION]/core</code> and <code>/srv/patches/[VERSION]/extensions/[NAME]</code> to the new core checkout and extensions, respectively. | |||
Check existing patches: | Check existing patches: | ||
Line 232: | Line 239: | ||
└── extensions | └── extensions | ||
└── [EXTENSION] | └── [EXTENSION] | ||
├── 01-T[NUMBER].patch | |||
└── 02-T[NUMBER].patch | |||
</syntaxhighlight> | </syntaxhighlight> | ||
* You can check a core patch to see if it will apply cleanly with | *You can check a core patch to see if it will apply cleanly with | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 240: | Line 249: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* If the patch checks out, apply and commit it with | *If the patch checks out, apply and commit it with | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 246: | Line 255: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* If the patch fails to apply, investigate whether it's due to a conflict (<code>git status</code>) or the patch having been merged since the new branch cut (search <code>git log</code> for the commit, etc.). If it turns out to be the latter, remove the patch file from the <code>/srv/patches/[VERSION]</code> directory. | For an extension: | ||
* If you need extra help, contact Security Team ([https://wikimediafoundation.org/role/staff-contractors/ Wikimedia Foundation], [https://www.mediawiki.org/wiki/Wikimedia_Security_Team MediaWiki], [https://office.wikimedia.org/wiki/Contact_list#Security Office Wiki]), currently {{ircnick|bawolff|Brian}} and {{ircnick|Reedy|Sam}} in IRC. | |||
<syntaxhighlight lang="shell-session"> | |||
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]/extensions/[EXTENSION]$ git apply --check --3way /srv/patches/[VERSION]/extensions/[EXTENSION]/[NUMBER]-T[NUMBER].patch | |||
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]/extensions/[EXTENSION]$ git am --3way /srv/patches/[VERSION]/extensions/[EXTENSION]/[NUMBER]-T[NUMBER].patch | |||
</syntaxhighlight> | |||
*If the patch fails to apply, investigate whether it's due to a conflict (<code>git status</code>) or the patch having been merged since the new branch cut (search <code>git log</code> for the commit, etc.). If it turns out to be the latter, remove the patch file from the <code>/srv/patches/[VERSION]</code> directory. | |||
*If you need extra help, contact Security Team ([https://wikimediafoundation.org/role/staff-contractors/ Wikimedia Foundation], [https://www.mediawiki.org/wiki/Wikimedia_Security_Team MediaWiki], [https://office.wikimedia.org/wiki/Contact_list#Security Office Wiki]), currently {{ircnick|bawolff|Brian}} and {{ircnick|Reedy|Sam}} in IRC. | |||
==== Create patches to update wikiversions.json ==== | ====Create patches to update wikiversions.json==== | ||
Create group0 to [VERSION] patch: | Create group0 to [VERSION] patch: | ||
Line 267: | Line 284: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Send staged patches to Gerrit for review ==== | ====Send staged patches to Gerrit for review==== | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 279: | Line 296: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Discard changes to working directory and index ==== | ====Discard changes to working directory and index==== | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 285: | Line 302: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Clean up old stuff ==== | ====Clean up old stuff==== | ||
[[:mw:MediaWiki 1.34/Roadmap]] is a good place to find when a branch was created. | [[:mw:MediaWiki 1.34/Roadmap]] is a good place to find when a branch was created. | ||
Line 327: | Line 344: | ||
Deleting a branch can take 10-15 minutes each. | Deleting a branch can take 10-15 minutes each. | ||
==== Sync to cluster and verify on testwiki ==== | ====Sync to cluster and verify on testwiki==== | ||
* Edit <code>/srv/mediawiki-staging/wikiversions.json</code> and set <code>testwiki</code> to <code>php-[VERSION]</code> | * Edit <code>/srv/mediawiki-staging/wikiversions.json</code> and set <code>testwiki</code> to <code>php-[VERSION]</code> | ||
* Do not commit and push to Gerrit, only make this change locally on the deployment server | *Do not commit and push to Gerrit, only make this change locally on the deployment server | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 335: | Line 353: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Run [[scap]] to (re)build localization caches and sync changes across the cluster. | *Run [[scap]] to (re)build localization caches and sync changes across the cluster. | ||
* 🐌 Note: this step will for about 20 minutes. | *🐌 Note: this step will for about 20 minutes. | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 348: | Line 366: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Verify version change on [https://test.wikipedia.org/wiki/Special:Version testwiki] (Installed software, Product: MediaWiki, Version: [VERSION]) and l10n cache ([https://test.wikipedia.org/wiki/Special:Version Special:Version] should not look like [https://test.wikipedia.org/wiki/Special:Version?uselang=qqx Special:Version?uselang=qqx]) | *Verify version change on [https://test.wikipedia.org/wiki/Special:Version testwiki] (Installed software, Product: MediaWiki, Version: [VERSION]) and l10n cache ([https://test.wikipedia.org/wiki/Special:Version Special:Version] should not look like [https://test.wikipedia.org/wiki/Special:Version?uselang=qqx Special:Version?uselang=qqx]) | ||
This can take half an hour. Opening or reloading the version page on testwiki after the scap sync command can take a minute or two. | This can take half an hour. Opening or reloading the version page on testwiki after the scap sync command can take a minute or two. | ||
* Revert local changes | *Revert local changes | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 358: | Line 376: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Update deploy notes ==== | ====Update deploy notes==== | ||
* Deploy notes are automatically generated by the [https://integration.wikimedia.org/ci/job/train-deploy-notes Train Deploy Notes] Jenkins job after you cut the branch | *Deploy notes are automatically generated by the [https://integration.wikimedia.org/ci/job/train-deploy-notes Train Deploy Notes] Jenkins job after you cut the branch | ||
* Be sure to check that the appropriate Changelog was created at <code><nowiki>https://www.mediawiki.org/wiki/MediaWiki_[VERSION]/Changelog</nowiki></code>. Example: [https://www.mediawiki.org/wiki/MediaWiki_1.34/wmf.4/Changelog MediaWiki 1.34/wmf.4/Changelog] | *Be sure to check that the appropriate Changelog was created at <code><nowiki>https://www.mediawiki.org/wiki/MediaWiki_[VERSION]/Changelog</nowiki></code>. Example: [https://www.mediawiki.org/wiki/MediaWiki_1.34/wmf.4/Changelog MediaWiki 1.34/wmf.4/Changelog] | ||
==== Wait for deploy window ==== | ====Wait for deploy window==== | ||
All of the changes above can be done at any time prior to the actual deployment window. | All of the changes above can be done at any time prior to the actual deployment window. | ||
=== During the deploy window === | ===During the deploy window=== | ||
====Switch group0 wikis to [VERSION]==== | |||
*CR+2 <code>group0 to [VERSION]</code> patch in Gerrit that you submitted earlier | |||
* CR+2 <code>group0 to [VERSION]</code> patch in Gerrit that you submitted earlier | *Wait for Gerrit/Zuul/Jenkins to merge the patch(es) | ||
* Wait for Gerrit/Zuul/Jenkins to merge the patch(es) | |||
* Pull patch(es) to deployment server | * Pull patch(es) to deployment server | ||
Line 377: | Line 396: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Check diff to ensure it is what you expect (this should show a bunch of version changes in wikiversions.json for group0 wikis) | *Check diff to ensure it is what you expect (this should show a bunch of version changes in wikiversions.json for group0 wikis) | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 383: | Line 402: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Apply changes | *Apply changes | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 389: | Line 408: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Sync the change across the cluster | *Sync the change across the cluster | ||
<syntaxhighlight lang="shell-session"> | <syntaxhighlight lang="shell-session"> | ||
Line 401: | Line 420: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
* Verify that [[:mw:Special:Version|mediawikiwiki]] switched to the new version (Installed software, Product: MediaWiki, Version: VERSION) | *Verify that [[:mw:Special:Version|mediawikiwiki]] switched to the new version (Installed software, Product: MediaWiki, Version: VERSION) | ||
* Monitor irc and [[Logstash|logstash]] and/or [[fatalmonitor]] for problems, see [[#Places to Watch for Breakage]] | *Monitor irc and [[Logstash|logstash]] and/or [[fatalmonitor]] for problems, see [[#Places to Watch for Breakage]] | ||
====Update roadmap==== | |||
*Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>0</code> (deployed to group0) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
* Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>0</code> (deployed to group0) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
For wikitext editor, change | For wikitext editor, change | ||
Line 434: | Line 454: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Wednesday: group0 to group1 deploy == | ==Wednesday: group0 to group1 deploy== | ||
==== Switch group1 wikis to [VERSION] ==== | ==== Switch group1 wikis to [VERSION]==== | ||
Use the <code>release/bin/deploy-promote</code> script to update <code>wikiversions.json</code> | Use the <code>release/bin/deploy-promote</code> script to update <code>wikiversions.json</code> | ||
Line 455: | Line 475: | ||
The above should take about five minutes, including the waiting time for Gerrit/CI. | The above should take about five minutes, including the waiting time for Gerrit/CI. | ||
==== Update roadmap ==== | ====Update roadmap==== | ||
* Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>1</code> (deployed to group1) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
*Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>1</code> (deployed to group1) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
For wikitext editor, change | For wikitext editor, change | ||
Line 476: | Line 497: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Thursday: group{0,1} to all deploy == | ==Thursday: group{0,1} to all deploy== | ||
==== Switch all wikis to [VERSION] ==== | ==== Switch all wikis to [VERSION]==== | ||
Thursday deploy is very similar to the Wednesday deploy, the only difference in terms of procedure is the target group | Thursday deploy is very similar to the Wednesday deploy, the only difference in terms of procedure is the target group | ||
Line 497: | Line 518: | ||
After the script run is complete, '''all wikis''' should be running [VERSION]. | After the script run is complete, '''all wikis''' should be running [VERSION]. | ||
==== Update roadmap ==== | ====Update roadmap==== | ||
* Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>2</code> (deployed to all wikis) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
*Change the <code>Deployed to group</code> (if you're using VisualEditor) or the 3rd parameter of the <code>WMFReleaseTableRow</code> template (if you're using the wikitext editor) to <code>2</code> (deployed to all wikis) at [[:mw:MediaWiki 1.34/Roadmap]]. | |||
For wikitext editor, change | For wikitext editor, change | ||
Line 518: | Line 540: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Incident documentation == | ==Incident documentation== | ||
* If there were problems during the train, follow instructions at [[Incident documentation]] on incident reports and post-mortem review. | *If there were problems during the train, follow instructions at [[Incident documentation]] on incident reports and post-mortem review. | ||
* Use <code>Create report</code> form to create a new page, <code>train-[VERSION]</code>. Example: [[Incident documentation/20181212-Train-1.33.0-wmf.8]]. | *Use <code>Create report</code> form to create a new page, <code>train-[VERSION]</code>. Example: [[Incident documentation/20181212-Train-1.33.0-wmf.8]]. | ||
* For Timeline section, events from [https://tools.wmflabs.org/sal/production SAL] and Phabricator task are a good start. | *For Timeline section, events from [https://tools.wmflabs.org/sal/production SAL] and Phabricator task are a good start. | ||
[[Category:How-To]] | [[Category:How-To]] | ||
[[Category:Deployment]] | [[Category:Deployment]] |
Revision as of 13:53, 20 August 2019
Deployments |
---|
|
Breakage
There will be times when this process does not go smoothly. There are guidelines for what do to when that happens.
In general, if there is an unexplained error that occurs within 1 hour of a train deployment — always roll back the train. Rolling back the train to eliminate it as the cause of unexplained breakage can be especially important if there are many ongoing possible causes for issues as this helps to eliminate one of those causes as the source of problems.
Rollback
To rollback a wikiversion change, it should be pretty quick. Go ahead and rollback production before you send patches up to gerrit since waiting on Jenkins may take a while:
USERNAME@deploy1001:/srv/mediawiki-staging$ git revert $(git log -1 --format=%H -- wikiversions.json)
USERNAME@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions 'Revert "group[0|1] wikis to [VERSION]"'
USERNAME@deploy1001:/srv/mediawiki-staging$ # Now that you've synced the revert, push patches up to gerrit, you have to run git commit --amend to get the changeid
USERNAME@deploy1001:/srv/mediawiki-staging$ git commit --amend
USERNAME@deploy1001:/srv/mediawiki-staging$ git push origin HEAD:refs/for/master/[VERSION]%l=Code-Review+2
Example:
USERNAME@deploy1001:/srv/mediawiki-staging$ git push origin HEAD:refs/for/master/1.34.0-wmf.0%l=Code-Review+2
- Wait for the patch to merge and the fetch back down to the deployment server
Places to Watch for Breakage
Train deployers should check for breakage as they are rolling out train as they are effectively the first line of defense for train deploys. Some of the places to watch for breakage:
- IRC
- primary channel is #wikimedia-operations connect
- useful channels are #mediawiki-core connect #wikimedia-dev connect
- for more channels see MediaWiki on IRC and IRC/Channels
- Logstash Fatal Monitor
- Logstash MediaWiki Errors
- Logstash "mediawiki-new-errors" dashboard (linked from logstash front page)
- Showing only timeout errors (see T204871)
- Group-specific Logstash Dashboards:
- Grafana Varnish error-rate dashboard (HTTP 5XX % should have 3+ 0s after the decimal point, e.g. 0.0001%)
- Grafana Frontend Responses NGINX vs Varnish
- Grafana Production Logging
- Minerva Client Errors - Browser JS errors count (only wikipedias on mobile)
If the train is blocked
- A task will be assigned to you, for example T191059 (1.32.0-wmf.13 deployment blockers)
- Any open subtasks block the train from moving forward. This means no further deployments until the blockers are resolved.
Checklist
If there are blocking tasks, please do the following:
- Make sure all tasks blocking train are set to
UBN!
priority in phabricator - Comment on the task asking for an ETA or if this can be solved by reverting a recent commit.
- Send e-mail to:
- ops@lists.wikimedia.org
- wikitech-ambassadors@lists.wikimedia.org
- wikitech-l@lists.wikimedia.org
- Subject:
[Train] {version} status update
- Body
The {version} version of MediaWiki is blocked[0]. The new version is deployed to {group(s){0,1,2}}[1], but can proceed no further until these issues are resolved: * {Phab task name} - {phab task link} Once these issues are resolved train can resume. If these issues are resolved on a Friday the train will resume Monday. Thank you for your help resolving these issues! -- Your humble train toiler [0]. <{link to phab task for train}> [1]. <https://tools.wmflabs.org/versions/>
- Add relevant people (see Developers/Maintainers) to the blocking task
- Ping relevant people in IRC
- Once train is unblocked be sure to thank the folks who helped unblock it
Tuesday: New branch creation and deploy
The new branch can be created in Gerrit from anywhere. It is often faster to do this step on a host in the cluster to minimize the time needed to clone from Gerrit.
Before the deploy window
Depending on how practiced you are and where you choose to run commands (full clones of mediawiki-core from outside the cluster can take a while) the steps will typically take 45 to 90 minutes.
Setup
The script is run as your regular user member of the wikidev
group (as of Feb 16th 2016).
Configure Git. You will need to ensure your user.name
and user.email
is set correctly. You will also need to ensure that you push to gerrit via ssh
(as of Mar 18th, 2019).
USERNAME@MACIHNE-NAME:~$ ssh deployment.eqiad.wmnet
USERNAME@deploy1001:~$ git config --global user.name "[FIRST-NAME] [LAST-NAME]"
USERNAME@deploy1001:~$ git config --global user.email "[USERNAME]@[DOMAIN]"
USERNAME@deploy1001:~$ git config --global url.ssh://[GERRIT-USERNAME]@gerrit.wikimedia.org:29418.pushInsteadOf https://gerrit.wikimedia.org/r
Create a new ssh key to use from the deployment server.
USERNAME@deploy1001:~$ ssh-keygen -t ed25519
Enter file in which to save the key (/home/USERNAME/.ssh/id_ed25519): <enter>
Enter passphrase (empty for no passphrase): <passphrase>
Enter same passphrase again: <passphrase>
Your identification has been saved in /home/USERNAME/.ssh/id_ed25519.
Your public key has been saved in /home/USERNAME/.ssh/id_ed25519.pub.
Add your new ~/.ssh/id_ed25519.pub
key to Gerrit:
- In the new UI go to SSH keys, copy the contents of your
~/.ssh/id_ed25519.pub
file into the ui. - In the old UI, go to SSH keys, copy the contents of your
~/.ssh/id_ed25519.pub
file into the ui.
Create a new ssh-agent so that your key remains unlocked during branch cutting:
USERNAME@deploy1001:~$ eval $(ssh-agent)
USERNAME@deploy1001:~$ ssh-add ~/.ssh/id_ed25519
Enter passphrase for /home/USERNAME/.ssh/id_ed25519: <passphrase>
Identity added: /home/USERNAME/.ssh/id_ed25519 (USERNAME@deploy1001)
You should be able to list the keys in your agent and see your key there:
USERNAME@deploy1001:~$ ssh-add -l
256 SHA256:WSjdx+WFnlo9Dd+FN63c35+Q3pArOD/TQpFEjNh7ODc USERNAME@deploy1001 (ED25519)
Add Gerrit to ~/.ssh/known_hosts
. Fingerprint is available only in the old UI at settings/ssh-keys.
USERNAME@deploy1001:~$ ssh -oFingerPrintHash=md5 -p 29418 GERRIT-USERNAME@gerrit.wikimedia.org
The authenticity of host '[gerrit.wikimedia.org]:29418 ([2620:0:861:3:208:80:154:85]:29418)' can't be established.
RSA key fingerprint is MD5:dc:e9:68:7b:99:1b:27:d0:f9:fd:ce:6a:2e:bf:92:e1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[gerrit.wikimedia.org]:29418,[2620:0:861:3:208:80:154:85]:29418' (RSA) to the list of known hosts.
**** Welcome to Gerrit Code Review ****
Hi You, you have successfully connected over SSH.
Unfortunately, interactive shells are disabled.
To clone a hosted Git repository, use:
git clone ssh://GERRIT-USERNAME@gerrit.wikimedia.org:29418/REPOSITORY_NAME.git
Connection to gerrit.wikimedia.org closed.
Clone or update mediawiki/tools/release
.
USERNAME@deploy1001:~$ git clone https://gerrit.wikimedia.org/r/mediawiki/tools/release
tmux or screen
Some scripts run for 10-60 minutes so consider using tmux or screen.
If you prefer tmux:
USERNAME@deploy1001:~$ tmux new -s train
...
USERNAME@deploy1001:~$ exit
If you need to leave in the middle you can do ctrl-b d
to detach and tmux a -t train
to attach.
If you prefer screen:
USERNAME@deploy1001:~$ screen -D -RR train
...
USERNAME@deploy1001:~$ exit
If you need to leave in the middle you can do ctrl-a d
to detach and screen -r train
to attach.
Create the new branch in Gerrit
USERNAME@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n [VERSION] -o master
Example:
USERNAME@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n 1.34.0-wmf.0 -o master
🐌 Note: the script will run for about 30 minutes.
If the process is interrupted (by rebooting Gerrit, for example), continue from where it stopped:
USERNAME@deploy1001:~/release/make-wmf-branch$ ./make-wmf-branch -n [VERSION] -o master -c extensions/[EXTENSION]
Clone new branch
This command will create a new /srv/mediawiki-staging/php-[VERSION]
directory:
USERNAME@deploy1001:/srv/mediawiki-staging$ scap prep [VERSION]
Example:
USERNAME@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.0
This should only take a couple of minutes.
Apply security patches
- Patches should be named sequentially in the order that they will cleanly apply (e.g.
01-T[NUMBER].patch
,02-T[NUMBER].patch
) - Check and apply each patch in both
/srv/patches/[VERSION]/core
and/srv/patches/[VERSION]/extensions/[NAME]
to the new core checkout and extensions, respectively.
Check existing patches:
USERNAME@deploy1001:~$ tree /srv/patches/[VERSION]
/srv/patches/[VERSION]
├── core
│ ├── 01-T[NUMBER].patch
│ └── 02-T[NUMBER].patch
└── extensions
└── [EXTENSION]
├── 01-T[NUMBER].patch
└── 02-T[NUMBER].patch
- You can check a core patch to see if it will apply cleanly with
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]$ git apply --check --3way /srv/patches/[VERSION]/core/[NUMBER]-T[NUMBER].patch
- If the patch checks out, apply and commit it with
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]$ git am --3way /srv/patches/[VERSION]/core/[NUMBER]-T[NUMBER].patch
For an extension:
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]/extensions/[EXTENSION]$ git apply --check --3way /srv/patches/[VERSION]/extensions/[EXTENSION]/[NUMBER]-T[NUMBER].patch
USERNAME@deploy1001:/srv/mediawiki-staging/php-[VERSION]/extensions/[EXTENSION]$ git am --3way /srv/patches/[VERSION]/extensions/[EXTENSION]/[NUMBER]-T[NUMBER].patch
- If the patch fails to apply, investigate whether it's due to a conflict (
git status
) or the patch having been merged since the new branch cut (searchgit log
for the commit, etc.). If it turns out to be the latter, remove the patch file from the/srv/patches/[VERSION]
directory. - If you need extra help, contact Security Team (Wikimedia Foundation, MediaWiki, Office Wiki), currently Brian (bawolff) and Sam (Reedy) in IRC.
Create patches to update wikiversions.json
Create group0 to [VERSION] patch:
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap update-wikiversions group0 [VERSION]
USERNAME@deploy1001:/srv/mediawiki-staging/$ git add wikiversions.json
USERNAME@deploy1001:/srv/mediawiki-staging/$ git commit -m "Group0 to [VERSION]"
Example:
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap update-wikiversions group0 1.34.0-wmf.0
USERNAME@deploy1001:/srv/mediawiki-staging/$ git add wikiversions.json
USERNAME@deploy1001:/srv/mediawiki-staging/$ git commit -m "Group0 to 1.34.0-wmf.0"
Send staged patches to Gerrit for review
USERNAME@deploy1001:/srv/mediawiki-staging/$ git push origin HEAD:refs/for/master/[VERSION]
Example:
USERNAME@deploy1001:/srv/mediawiki-staging/$ git push origin HEAD:refs/for/master/1.34.0-wmf.0
Discard changes to working directory and index
USERNAME@deploy1001:/srv/mediawiki-staging/$ git reset --hard origin/master
Clean up old stuff
mw:MediaWiki 1.34/Roadmap is a good place to find when a branch was created.
List all branches:
USERNAME@deploy1001:/srv/mediawiki-staging/$ find . -maxdepth 1 -type d -name 'php-*' -print
Find old branches, more than 30 days old:
USERNAME@deploy1001:/srv/mediawiki-staging/$ find . -mindepth 2 -maxdepth 2 -type f -path './php-*/README' -ctime +30 -exec dirname {} \;
For all branches more than 30 days old, drop everything.
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap clean --delete [some old version from find -ctime +30 output above]
Example:
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap clean --delete 1.34.0-wmf.0
For all branches older than the currently active branch(es) and prior one, prune everything that's not a static asset (we need those for cached CSS/JS/etc). Active branches are visible at Wikimedia MediaWiki versions page.
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap clean [some old version from find -ctime +30 output above]
Example:
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap clean 1.34.0-wmf.0
Deleting a branch can take 10-15 minutes each.
Sync to cluster and verify on testwiki
- Edit
/srv/mediawiki-staging/wikiversions.json
and settestwiki
tophp-[VERSION]
- Do not commit and push to Gerrit, only make this change locally on the deployment server
USERNAME@deploy1001:/srv/mediawiki-staging/$ vim wikiversions.json
- Run scap to (re)build localization caches and sync changes across the cluster.
- 🐌 Note: this step will for about 20 minutes.
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap sync "testwiki to php-[VERSION] and rebuild l10n cache"
Example:
USERNAME@deploy1001:/srv/mediawiki-staging/$ scap sync "testwiki to php-1.34.0-wmf.0 and rebuild l10n cache"
- Verify version change on testwiki (Installed software, Product: MediaWiki, Version: [VERSION]) and l10n cache (Special:Version should not look like Special:Version?uselang=qqx)
This can take half an hour. Opening or reloading the version page on testwiki after the scap sync command can take a minute or two.
- Revert local changes
USERNAME@deploy1001:/srv/mediawiki-staging/$ git checkout -- wikiversions.json
Update deploy notes
- Deploy notes are automatically generated by the Train Deploy Notes Jenkins job after you cut the branch
- Be sure to check that the appropriate Changelog was created at
https://www.mediawiki.org/wiki/MediaWiki_[VERSION]/Changelog
. Example: MediaWiki 1.34/wmf.4/Changelog
Wait for deploy window
All of the changes above can be done at any time prior to the actual deployment window.
During the deploy window
Switch group0 wikis to [VERSION]
- CR+2
group0 to [VERSION]
patch in Gerrit that you submitted earlier - Wait for Gerrit/Zuul/Jenkins to merge the patch(es)
- Pull patch(es) to deployment server
USERNAME@deploy1001:/srv/mediawiki-staging$ git fetch
- Check diff to ensure it is what you expect (this should show a bunch of version changes in wikiversions.json for group0 wikis)
USERNAME@deploy1001:/srv/mediawiki-staging$ git diff HEAD..origin/master
- Apply changes
USERNAME@deploy1001:/srv/mediawiki-staging$ git rebase origin/master
- Sync the change across the cluster
USERNAME@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions "group0 to [VERSION]"
Example:
USERNAME@deploy1001:/srv/mediawiki-staging$ scap sync-wikiversions "group0 to 1.34.0-wmf.0"
- Verify that mediawikiwiki switched to the new version (Installed software, Product: MediaWiki, Version: VERSION)
- Monitor irc and logstash and/or fatalmonitor for problems, see #Places to Watch for Breakage
Update roadmap
- Change the
Deployed to group
(if you're using VisualEditor) or the 3rd parameter of theWMFReleaseTableRow
template (if you're using the wikitext editor) to0
(deployed to group0) at mw:MediaWiki 1.34/Roadmap.
For wikitext editor, change
{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|[VERSION]|[DATE]|}}
...
{{WMFReleaseTableFooter}}
to
{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|[VERSION]|[DATE]|0}}
...
{{WMFReleaseTableFooter}}
Example:
{{WMFReleaseTableHead}}
{{WMFReleaseTableRow|12|2018-07-10|0}}
...
{{WMFReleaseTableFooter}}
Wednesday: group0 to group1 deploy
Switch group1 wikis to [VERSION]
Use the release/bin/deploy-promote
script to update wikiversions.json
USERNAME@deploy1001:~$ ./release/bin/deploy-promote
Promote group1 from [PREVIOUS-VERSION] to [VERSION] [y/N]
The script automatically Code-Review +2 the patch in Gerrit. Once CI has merged the patch, hit enter at the 2nd prompt.
Now wait for jenkins to merge the patch, then press enter to continue with git pull && scap sync-wikiversions
After the script run is complete, group1 wikis should be running [VERSION].
The above should take about five minutes, including the waiting time for Gerrit/CI.
Update roadmap
- Change the
Deployed to group
(if you're using VisualEditor) or the 3rd parameter of theWMFReleaseTableRow
template (if you're using the wikitext editor) to1
(deployed to group1) at mw:MediaWiki 1.34/Roadmap.
For wikitext editor, change
{{WMFReleaseTableRow|[VERSION]|[DATE]|0}}
to
{{WMFReleaseTableRow|[VERSION]|[DATE]|1}}
Example:
{{WMFReleaseTableRow|12|2018-07-10|1}}
Thursday: group{0,1} to all deploy
Switch all wikis to [VERSION]
Thursday deploy is very similar to the Wednesday deploy, the only difference in terms of procedure is the target group
Use the release/bin/deploy-promote all
script to update wikiversions.json
USERNAME@deploy1001:~$ ./release/bin/deploy-promote all
Promote all from [PREVIOUS-VERSION] to [VERSION] [y/N]
The script automatically Code-Review +2 the patch in Gerrit. Once CI has merged the patch, hit enter at the 2nd prompt.
Now wait for jenkins to merge the patch, then press enter to continue with git pull && scap sync-wikiversions
After the script run is complete, all wikis should be running [VERSION].
Update roadmap
- Change the
Deployed to group
(if you're using VisualEditor) or the 3rd parameter of theWMFReleaseTableRow
template (if you're using the wikitext editor) to2
(deployed to all wikis) at mw:MediaWiki 1.34/Roadmap.
For wikitext editor, change
{{WMFReleaseTableRow|[VERSION]|[DATE]|1}}
to
{{WMFReleaseTableRow|[VERSION]|[DATE]|2}}
Example:
{{WMFReleaseTableRow|12|2018-07-10|2}}
Incident documentation
- If there were problems during the train, follow instructions at Incident documentation on incident reports and post-mortem review.
- Use
Create report
form to create a new page,train-[VERSION]
. Example: Incident documentation/20181212-Train-1.33.0-wmf.8. - For Timeline section, events from SAL and Phabricator task are a good start.