You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
ORES/Deployment: Difference between revisions
imported>Halfak (→Beta (ores-beta.wmflabs.org): update) |
imported>Awight (Updates after learning how to deploy) |
||
Line 15: | Line 15: | ||
== Update wheels == | == Update wheels == | ||
First, clone https://github.com/wiki-ai/ores- | First, clone https://github.com/wiki-ai/ores-wmflabs-deploy: | ||
<pre> | <pre> | ||
git clone https://github.com/wiki-ai/ores- | git clone https://github.com/wiki-ai/ores-wmflabs-deploy | ||
</pre> | </pre> | ||
There is a file in ores- | There is a file in ores-wmflabs-deploy called "requirements.txt". Update their version number and make wheels by making a virtualenv and installing everything in it: | ||
<pre> | <pre> | ||
virtualenv -p python3 tmp | virtualenv -p python3 tmp | ||
source tmp/bin/activate | source tmp/bin/activate | ||
pip update pip | |||
pip install wheel | pip install wheel | ||
pip wheel -w wheels/ -r requirements.txt | pip wheel -w wheels/ -r requirements.txt | ||
</pre> | </pre> | ||
It's better to do this in an environment similar to the production cluster. ores- | It's better to do this in an environment similar to the production cluster. ores-misc-01.ores.eqiad.wmflabs is designed to do that. Don't forget to install C dependencies beforehand. Be careful if any kind of error happened. | ||
Once wheels are ready, there is a repo in gerrit called wheels (in research/ores/wheels) we keep wheels and nltk data in it. You need to git clone, update wheels and make a patch: | Once wheels are ready, there is a repo in gerrit called wheels (in research/ores/wheels) we keep wheels and nltk data in it. You need to git clone, update wheels and make a patch: | ||
Line 45: | Line 46: | ||
git review -R | git review -R | ||
</pre> | </pre> | ||
== Update ores- | == Update ores-wmflabs-deploy == | ||
After +2ing and being merged, you should update ores- | After +2ing and being merged, you should update ores-wmflabs-deploy | ||
<pre> | <pre> | ||
cd ores- | cd ores-wmflabs-deploy | ||
git checkout -b wiki_ai_1.2 | git checkout -b wiki_ai_1.2 | ||
source tmp/bin/activate | source tmp/bin/activate | ||
Line 68: | Line 69: | ||
And: | And: | ||
* "mediawiki/services/ores/deploy" for [https://github.com/wiki-ai/ores- | * "mediawiki/services/ores/deploy" for [https://github.com/wiki-ai/ores-wmflabs-deploy ores-wmflabs-deploy] (note that these repos have diverged [FIXME: Mande?]) | ||
* "mediawiki/services/ores/editquality" for [https://github.com/wiki-ai/editquality editquality] | * "mediawiki/services/ores/editquality" for [https://github.com/wiki-ai/editquality editquality] | ||
* "mediawiki/services/ores/wikiclass" for [https://github.com/wiki-ai/wikiclass wikiclass] | * "mediawiki/services/ores/wikiclass" for [https://github.com/wiki-ai/wikiclass wikiclass] | ||
== Deploy == | == Deploy == | ||
You need to | You need to log into each deploy server to deploy a new version using fabric or scap3, so make sure you have the required permissions. | ||
We have a series of increasingly production-like environments available for smoke testing each release, please take the time to go through each step, labs staging -> beta -> production. There is also an automatic canary deployment during scap, which stops after pushing to scb1002 and gives you the opportunity to compare that server's health to its brethren's. | |||
=== Read the logs === | |||
If something does go wrong, you'll want to read the diagnostic messages. See <code>/srv/log/ores/main.log</code> and <code>app.log</code>. Monitor the logs throughout each of these deployment stages, by going to the target server and running, | |||
sudo tail -f /srv/log/ores/*.log | |||
=== Labs (ores.wmflabs.org) === | === Labs (ores.wmflabs.org) === | ||
First, | First, go to staging. Simply make your changes in the ores-wmflabs-deploy repo and do <code>fab stage</code> (don't forget to log it in #wikimedia-cloud by typing this: "!log ores-staging deploying <HASH> into staging". | ||
Then check ores-staging.wmflabs.org to see if everything is | Then check ores-staging.wmflabs.org to see if everything is healthy. If so, you are good to go to the labs setup. | ||
Rebase the "deploy" branch onto master. | |||
<pre> | <pre> | ||
git checkout deploy | git checkout deploy | ||
Line 84: | Line 91: | ||
git push -f origin deploy | git push -f origin deploy | ||
</pre> | </pre> | ||
If working as expected, deploy with "fab deploy_web" and then "fab deploy_celery". Once it's done, test ores.wmflabs.org to see if everything is working as expected. | |||
=== Beta (ores-beta.wmflabs.org) === | === Beta (ores-beta.wmflabs.org) === | ||
# <tt>ssh deployment-tin.eqiad.wmflabs</tt> and <tt>cd /srv/deployment/ores/deploy</tt> | # <tt>ssh deployment-tin.eqiad.wmflabs</tt> and <tt>cd /srv/deployment/ores/deploy</tt> | ||
# <tt>git pull | # <tt>git pull && git submodule update --init</tt> | ||
# Record the NEWHASH at the top of <tt>git log | # Record the NEWHASH at the top of <tt>git log -1</tt> | ||
# Deploy with <tt>scap deploy -v "<relevant task -- e.g. T1234>"</tt> and check out if anything works as expected. | # Deploy with <tt>scap deploy -v "<relevant task -- e.g. T1234>"</tt> and check out if anything works as expected. | ||
#* This will automatically post a log line in {{channel|wikimedia-releng}} | #* This will automatically post a log line in {{channel|wikimedia-releng}} | ||
Line 97: | Line 104: | ||
; Prep work | ; Prep work | ||
We'll double check the hash that is deployed in case we need to revert and then update the code to current master. | We'll double check the hash that is deployed in case we need to revert and then update the code to current master. | ||
# < | # <code>ssh deployment.eqiad.wmnet</code>. Then <code>cd /srv/deployment/ores/deploy</code>. | ||
# Record the latest revision (OLDHASH) with < | # Record the latest revision (OLDHASH) with <code>git log -1</code> (in case you needed to rollback) | ||
# Update the deploy repository with < | # Update the deploy repository with <code>git pull</code> and <code>git submodule update --init</code> | ||
# Record the new revision (NEWHASH) and prepare a message to send to {{channel|wikimedia-operations}}: "!log deploying ores <NEWHASH>" | |||
; Deploy to canary | ; Deploy to canary | ||
Then you need to deploy it into a node to check if it works as expected. It's called canary node. Right now, it's scb1002.eqiad.wmnet. | Then you need to deploy it into a node to check if it works as expected. It's called canary node. Right now, it's scb1002.eqiad.wmnet. | ||
# Run the deploy with < | # Run the deploy with <code>scap deploy -v "<relevant task -- e.g. T1234>"</code>. '''Do not hit "y" yet!''' You have just deployed to the canary server, please smoke test. | ||
# | # <code>ssh scb1002.eqiad.wmnet</code> and check the service internally by commanding <code>curl <nowiki>http://0.0.0.0:8081/v3/scores/testwiki/$(date</nowiki> +%s)</code> | ||
#* It would be great if you test other aspects if you are changing them (e.g. test if it returns data if you are adding a new model). | #* It would be great if you test other aspects if you are changing them (e.g. test if it returns data if you are adding a new model). | ||
; Continue deployment to prod | ; Continue deployment to prod | ||
If | If everything works as expected, we're ready to continue. | ||
# Deploy it fully by answering "y" | # Deploy it fully by answering "y" to the scap prompt. | ||
<!--# Send the log message in #wikimedia-operations. --> | <!--# Send the log message in #wikimedia-operations. --> | ||
# ... | # ... | ||
#* If something went wrong, rollback with < | #* If something went wrong, rollback with <code>scap deploy -v -r <OLDHASH></code> | ||
#* If everything looks OK, say "Victory! ORES deploy looks good" (or something like that) in #wikimedia-operations | #* If everything looks OK, say "Victory! ORES deploy looks good" (or something like that) in #wikimedia-operations |
Revision as of 23:38, 18 July 2017
This page is a guide on how to deploy new version of ORES into the server.
PyPI
So, your patches are merged into ores/revscoring/other dependencies. You need to increment the version number. Try to do that in a SemVer fashion. Like only upgrading the patch level (e.g. 0.5.8 -> 0.5.9). You need to do it in setup.py and __init__.py (and probably some other place too, use grep to check where the current version is used)
Then you need to push new version into PyPI using:
python setup.py sdist bdist_wheel upload
If you got GPG/PGP you can try adding sign to the list above to also sign the wheel and the sdist
Update models
If you are doing breaking changes to revscoring probably old model files won't work, so you need to rebuild models. Do it using Makefile in editquality & wikiclass repos. If a model changes substantially (new features, new algorithm, etc), make sure to increment the model versions in the Makefile too.
Update wheels
First, clone https://github.com/wiki-ai/ores-wmflabs-deploy:
git clone https://github.com/wiki-ai/ores-wmflabs-deploy
There is a file in ores-wmflabs-deploy called "requirements.txt". Update their version number and make wheels by making a virtualenv and installing everything in it:
virtualenv -p python3 tmp source tmp/bin/activate pip update pip pip install wheel pip wheel -w wheels/ -r requirements.txt
It's better to do this in an environment similar to the production cluster. ores-misc-01.ores.eqiad.wmflabs is designed to do that. Don't forget to install C dependencies beforehand. Be careful if any kind of error happened.
Once wheels are ready, there is a repo in gerrit called wheels (in research/ores/wheels) we keep wheels and nltk data in it. You need to git clone, update wheels and make a patch:
git clone ssh://YOURUSERNAME@gerrit.wikimedia.org:29418/research/ores/wheels
Then, you need to copy new versions to wheels folder, delete old ones and make a new patch:
cd wheels git commit -m "New wheels for wiki-ai 1.2" -a git review -R
Update ores-wmflabs-deploy
After +2ing and being merged, you should update ores-wmflabs-deploy
cd ores-wmflabs-deploy git checkout -b wiki_ai_1.2 source tmp/bin/activate pip freeze | grep -v setuptools > requirements-frozen.txt cd submodules/wheels git pull cd ../.. git commit -m "Release wiki-ai 1.2" git push -f origin wiki_ai_1.2
After that you need to make a PR in github and once it's merged it's good to go!
If you want to deploy to prod as well (ores.wikimedia.org) you need to backport your commits in gerrit too (ewww). The gerrit repos are:
git clone ssh://YOURUSERNAME@gerrit.wikimedia.org:29418/mediawiki/services/ores
For ores.
And:
- "mediawiki/services/ores/deploy" for ores-wmflabs-deploy (note that these repos have diverged [FIXME: Mande?])
- "mediawiki/services/ores/editquality" for editquality
- "mediawiki/services/ores/wikiclass" for wikiclass
Deploy
You need to log into each deploy server to deploy a new version using fabric or scap3, so make sure you have the required permissions.
We have a series of increasingly production-like environments available for smoke testing each release, please take the time to go through each step, labs staging -> beta -> production. There is also an automatic canary deployment during scap, which stops after pushing to scb1002 and gives you the opportunity to compare that server's health to its brethren's.
Read the logs
If something does go wrong, you'll want to read the diagnostic messages. See /srv/log/ores/main.log
and app.log
. Monitor the logs throughout each of these deployment stages, by going to the target server and running,
sudo tail -f /srv/log/ores/*.log
Labs (ores.wmflabs.org)
First, go to staging. Simply make your changes in the ores-wmflabs-deploy repo and do fab stage
(don't forget to log it in #wikimedia-cloud by typing this: "!log ores-staging deploying <HASH> into staging".
Then check ores-staging.wmflabs.org to see if everything is healthy. If so, you are good to go to the labs setup. Rebase the "deploy" branch onto master.
git checkout deploy git rebase origin/master git push -f origin deploy
If working as expected, deploy with "fab deploy_web" and then "fab deploy_celery". Once it's done, test ores.wmflabs.org to see if everything is working as expected.
Beta (ores-beta.wmflabs.org)
- ssh deployment-tin.eqiad.wmflabs and cd /srv/deployment/ores/deploy
- git pull && git submodule update --init
- Record the NEWHASH at the top of git log -1
- Deploy with scap deploy -v "<relevant task -- e.g. T1234>" and check out if anything works as expected.
- This will automatically post a log line in #wikimedia-releng connect
Production cluster (ores.wikimedia.org)
You are doing a dangerous thing. Remember, breaking the site is extremely easy! Be careful in every step and try to have someone from the team and ops supervising you. Also remember, ORES is depending on a huge number of puppet configurations, check out if your change is compatible with puppet configs and change puppet configs if necessary.
- Prep work
We'll double check the hash that is deployed in case we need to revert and then update the code to current master.
ssh deployment.eqiad.wmnet
. Thencd /srv/deployment/ores/deploy
.- Record the latest revision (OLDHASH) with
git log -1
(in case you needed to rollback) - Update the deploy repository with
git pull
andgit submodule update --init
- Record the new revision (NEWHASH) and prepare a message to send to #wikimedia-operations connect: "!log deploying ores <NEWHASH>"
- Deploy to canary
Then you need to deploy it into a node to check if it works as expected. It's called canary node. Right now, it's scb1002.eqiad.wmnet.
- Run the deploy with
scap deploy -v "<relevant task -- e.g. T1234>"
. Do not hit "y" yet! You have just deployed to the canary server, please smoke test. ssh scb1002.eqiad.wmnet
and check the service internally by commandingcurl http://0.0.0.0:8081/v3/scores/testwiki/$(date +%s)
- It would be great if you test other aspects if you are changing them (e.g. test if it returns data if you are adding a new model).
- Continue deployment to prod
If everything works as expected, we're ready to continue.
- Deploy it fully by answering "y" to the scap prompt.
- ...
- If something went wrong, rollback with
scap deploy -v -r <OLDHASH>
- If everything looks OK, say "Victory! ORES deploy looks good" (or something like that) in #wikimedia-operations
- If something went wrong, rollback with