You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Parsoid: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arlolra
imported>C. Scott Ananian
(→‎Misc stuff: Add notes about enumerating hosts in beta/labs.)
Line 27: Line 27:


==== Verify deployment version on beta after the deploy patch is merged ====
==== Verify deployment version on beta after the deploy patch is merged ====
* Deploy code (if not already there) to the beta cluster.
* Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to <code>deployment-deploy01.deployment-prep.eqiad.wmflabs</code>)
* If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
* If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
* On beta cluster, perform manual VisualEditor editing tests. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check parsoid logs, if necessary.
* On beta cluster, perform manual VisualEditor editing tests. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check parsoid logs, if necessary.
Line 37: Line 37:
Before you begin, note that Parsoid caches its git version string.  So you may wish to do:
Before you begin, note that Parsoid caches its git version string.  So you may wish to do:
<source lang="bash">
<source lang="bash">
ssh -A deployment.eqiad.wmnet
ssh deployment.eqiad.wmnet
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
   curl "http://$wtp:8000/_version"; echo; done;
   curl "http://$wtp:8000/_version"; echo; done;
Line 44: Line 44:


Now to do the deploy:<source lang="bash">
Now to do the deploy:<source lang="bash">
ssh -A deployment.eqiad.wmnet
ssh deployment.eqiad.wmnet
tin$ cd /srv/deployment/parsoid/deploy
tin$ cd /srv/deployment/parsoid/deploy
tin$ git pull
tin$ git pull
tin$ git submodule update --init
tin$ git submodule update --init
tin$ scap deploy 'Updating Parsoid to <new hash>'
tin$ scap deploy 'Updating Parsoid to <new hash> (T<bug number>, T<bug number>)'
</source>
</source>


The argument in the <code>scap</code> command will appear in the [[Server Admin Log]] as scap runs.
The argument in the <code>scap</code> command will appear in the [[Server Admin Log]] as scap runs.  List the hash of the deployed Parsoid version as well as any bug numbers referenced in the [[mw:Parsoid/Deployments|deploy log]]. This will create cross-references in the listed bugs to the SAL.


First, the canary group is deployed to (wtp[12]00[12]). Watch the canary machines on ganglia for [https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Parsoid%2520eqiad&tab=m&vn= eqiad] and [https://ganglia.wikimedia.org/latest/?c=Parsoid%20codfw&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 codfw] for a while to ensure there aren't any sudden load changes, and check the logs in [https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid logstash] to ensure they are not being spammed with some new error.
First, the canary group is deployed to (wtp{{1025,1026,2001,2002}). Watch the canary machines on grafana for [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=parsoid&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=parsoid&var-instance=wtp1025&var-instance=wtp1026 eqiad] and [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=parsoid&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=parsoid&var-instance=wtp2001&var-instance=wtp2002 codfw] for a while to ensure there aren't any sudden load changes, and check the logs in [https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid logstash] to ensure they are not being spammed with some new error.
If everything is OK, Scap will ask you whether you want to continue to the next group or do all groups. It is safe to use <kbd>c</kbd> to tell it to do all groups without asking.
If everything is OK, Scap will ask you whether you want to continue to the next group or do all groups. It is safe to use <kbd>c</kbd> to tell it to do all groups without asking.


Once everything is done, log the deploy in #wikimedia-operations with something like
Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like


  !log Updated Parsoid to version <new hash> (T<bug number>, T<bug number>)
  !log Attempted Parsoid deploy, but rolled back to version <new hash> (T<bug number>)


listing the hash of the deployed Parsoid version as well as any bug numbers referenced in the [[mw:Parsoid/Deployments|deploy log]].  This creates a timestamped entry in the [[Server Admin Log]] and creates cross-references in the listed bugs to the SAL.
As with the argument to <code>scap</code>, mentioning bug numbers will create cross-references to the SAL.


=== Post-deploy checks ===
=== Post-deploy checks ===
Line 67: Line 67:
** Reading through the recent edits ([https://fr.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges frwiki], [https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges enwiki]) can also be a good check.
** Reading through the recent edits ([https://fr.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges frwiki], [https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges enwiki]) can also be a good check.
* Verify all Parsoid servers are running the same version with:<syntaxhighlight lang="bash">
* Verify all Parsoid servers are running the same version with:<syntaxhighlight lang="bash">
ssh -A deployment.eqiad.wmnet
ssh deployment.eqiad.wmnet
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
   curl "http://$wtp:8000/_version"; echo; done;
   curl "http://$wtp:8000/_version"; echo; done;
Line 115: Line 115:
... verify deployment ...
... verify deployment ...
</pre>
</pre>
=== Restarting ===
In case you need to restart Parsoid without any deployments (for example, to reload mediawiki configs from config or other deployments),
* Restart parsoid hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
*:<code>cd /srv/deployment/parsoid/deploy && scap deploy --service-restart</code>


== When something goes wrong ==
== When something goes wrong ==
Line 123: Line 129:
cd /srv/deployment/parsoid/deploy
cd /srv/deployment/parsoid/deploy
scap deploy --rev <sha></pre>
scap deploy --rev <sha></pre>
=== Restarting ===
* Restart parsoid hosts, from deployment.eqiad.wmnet or deployment-tin
*:<code>cd /srv/deployment/parsoid/deploy && scap deploy --service-restart</code>


=== Misc stuff ===
=== Misc stuff ===
Line 133: Line 135:
* To see which hosts are pooled, from another host
* To see which hosts are pooled, from another host
*:<code>confctl select dc=.*,cluster=parsoid,service=parsoid get</code>
*:<code>confctl select dc=.*,cluster=parsoid,service=parsoid get</code>
* To see the list of parsoid hosts in beta:
*:<code>cat /srv/deployment/parsoid/deploy/scap/betacluster</code>
** See also <code>/srv/deployment/parsoid/deploy/scap/scap.cfg</code> in general
* To change which hosts are pooled, from deployment.eqiad.wmnet
* To change which hosts are pooled, from deployment.eqiad.wmnet
*:<code>SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'</code>
*:<code>SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'</code>

Revision as of 22:10, 3 August 2018

Parsoid is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext.

  • VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.
  • Flow (as configured on WMF wikis with $wgFlowContentFormat = 'html') works the other way around. When a user creates a post Flow uses Parsoid to convert the wikitext to HTML and Flow stores the HTML in ExternalStore. If someone later edits a post Flow uses Parsoid to convert the HTML back to wikitext for editing.

Monitoring

Deploying changes

Parsoid is deployed using scap3. Doing deployments with scap3 is very easy. You just run scap deploy, which pushes the new state to all backends and restarts them. You should have deploy access and be a member of the deploy-service group puppet group.

Pre-deploy checks

Prepare the deploy patch

  • Check http://parsoid-rt-tests.wikimedia.org/regressions/between/{from}/{to} where {from} is the last deployed hash from mw:Parsoid/Deployments and {to} is the latest tested commit (which we're about to deploy)
    • http://parsoid-rt-tests.wikimedia.org/commits gives you a nice radio-button interface to create this URL
    • BEWARE: if you get the output total regressions between selected revisions: 0, it is extremely likely that you mistyped the hash or that we didn't actually run round-trip tests for that particular hash. (This is a bug, we should probably give a better message in this case.)
    • Since we are using current revision of titles in round-trip testing, edits to pages can show up as false regressions. tools/regression-testing.js is useful in filtering those out. Grab the wiki:title pairs that show up as regressions in the regressions url, save it to a file and feed it to the regressions tool and you will get a list of pages to look more closely, if necessary.
  • Create a short deployment summary on mw:Parsoid/Deployments from git log --cherry-pick {from}...{to}. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, parser test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
  • Prepare a deploy repo commit and push for +2
    • Roughly: cd deploy ; git checkout master ; git pull origin master ; git submodule update ; cd src ; git checkout {to} ; cd .. ; git add -u ; git commit -m "Bump src to {to} for deploy" ; git review

Verify deployment version on beta after the deploy patch is merged

  • Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to deployment-deploy01.deployment-prep.eqiad.wmflabs)
  • If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
  • On beta cluster, perform manual VisualEditor editing tests. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check parsoid logs, if necessary.

Be around on IRC

  • Add yourself to the "deployer" field of Deployments if you're not already there
  • Be online in freenode #wikimedia-operations (and stay online through the deployment window)

Deploying the latest version of Parsoid

Before you begin, note that Parsoid caches its git version string. So you may wish to do:

ssh deployment.eqiad.wmnet
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
   curl "http://$wtp:8000/_version"; echo; done;

to ensure that the "old" version string is cached, so that you will be able to tell when parsoid restarts with its "new" version below.

Now to do the deploy:

ssh deployment.eqiad.wmnet
tin$ cd /srv/deployment/parsoid/deploy
tin$ git pull
tin$ git submodule update --init
tin$ scap deploy 'Updating Parsoid to <new hash> (T<bug number>, T<bug number>)'

The argument in the scap command will appear in the Server Admin Log as scap runs. List the hash of the deployed Parsoid version as well as any bug numbers referenced in the deploy log. This will create cross-references in the listed bugs to the SAL.

First, the canary group is deployed to (wtp{{1025,1026,2001,2002}). Watch the canary machines on grafana for eqiad and codfw for a while to ensure there aren't any sudden load changes, and check the logs in logstash to ensure they are not being spammed with some new error. If everything is OK, Scap will ask you whether you want to continue to the next group or do all groups. It is safe to use c to tell it to do all groups without asking.

Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like

!log Attempted Parsoid deploy, but rolled back to version <new hash> (T<bug number>)

As with the argument to scap, mentioning bug numbers will create cross-references to the SAL.

Post-deploy checks

  • Test VE editing on enwiki and non-latin wikis
    • For example, open it:Luna (or other complex page), start the visual editor, make some random vandalism, click save -> review changes, then verify that the wikitext reflects your changes and was not corrupted. Hit cancel to abort the edit.
    • Reading through the recent edits (frwiki, enwiki) can also be a good check.
  • Verify all Parsoid servers are running the same version with:
    ssh deployment.eqiad.wmnet
    tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
       curl "http://$wtp:8000/_version"; echo; done;
    

Deploying a cherry-picked patch

One way to do this is to create a new branch in the Parsoid repo and cherry-pick your patches to that. For example:

git checkout 497da30e # this is the commit on the master branch that you want to cherry pick on top of
git checkout -b deploy-20150528 # give it a name (go ahead and use the date of your deploy)
git cherry-pick f274c3f54f385a6ac159a47209d279b9040a161c # patch number 1
git cherry-pick de087b106be48fc6e97f2ebc4644f9d297ecdfed # patch number 2
git push gerrit deploy-20150528:deploy-20150528 # create the branch in gerrit (DON'T USE SLASHES HERE)

Now do the usual steps to prepare a deploy repo (see below) using the hash of your branch commit (73445bfd in the example below):

cd deploy
git checkout master ; git pull origin master ; git submodule update ; cd src ; git checkout 73445bfddded9f0baa6afe548c98880f4401fb7b # your branch commit
cd .. ; git add -u ; git commit -m "Bump src to 73445bfd (deploy-20150528 branch) for deploy"
git review -u

Note that the automated push to beta will fail if your gerrit branch name contains a slash. This is probably just because some ancient version of git is being used, and will eventually be fixed. But in the meantime, use dashes instead of slashes.

Cherry-picking directly from tin and deploying it

In many situations, a hotfix might need to be pushed quickly. One way to do that is to cherry-pick the patch on tin (aka deployment.eqiad.wmnet) and sync it.

### Verify that you have the most recently deployed code that you want to cherry-pick on top of
tin$ cd /srv/deployment/parsoid/deploy (verify via git log)
tin$ cd src (verify via git log)

### Create a hotfix branch
tin$ git checkout -b hotfix_<some_unique_tag>

### Get latest code from master you want to cherry-pick from
tin$ git checkout master; git pull

### Check out the hotfix branch and cherry-pick
tin$ git checkout hotfix_<some_unique_tag>
tin$ git cherry-pick <commit-from-master>

### Create a deploy-repo patch
tin$ cd ..; git commit -a -m "Bump src to whatever-git-sha-it-is for hotfix"

### The usual deployment steps
tin$ scap deploy
... verify deployment ...

Restarting

In case you need to restart Parsoid without any deployments (for example, to reload mediawiki configs from config or other deployments),

  • Restart parsoid hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
    cd /srv/deployment/parsoid/deploy && scap deploy --service-restart

When something goes wrong

Reverting a Parsoid deployment

Code

ssh deployment.eqiad.wmnet
cd /srv/deployment/parsoid/deploy
scap deploy --rev <sha>

Misc stuff

  • To deploy to a single host
    scap deploy --force -l <node>
  • To see which hosts are pooled, from another host
    confctl select dc=.*,cluster=parsoid,service=parsoid get
  • To see the list of parsoid hosts in beta:
    cat /srv/deployment/parsoid/deploy/scap/betacluster
    • See also /srv/deployment/parsoid/deploy/scap/scap.cfg in general
  • To change which hosts are pooled, from deployment.eqiad.wmnet
    SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'

Data flow

Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in extensions/VisualEditor/ApiVisualEditor.php and is invoked with a POST request to /w/api.php?action=ve-parsoid. The API module then sends a request to Parsoid, either GET /$prefix/$pagename to get the HTML for a page, or POST /$prefix/$pagename to submit HTML and get wikitext back. Parsoid itself also issues requests to /w/api.php to get the wikitext of the requested page and to do template expansion.

Once the ve-parsoid API module receives a response from Parsoid, it either relays it back to the client (when requesting HTML), or saves the returned wikitext to the page (when submitting HTML).

                (POST /w/api.php?action=ve-parsoid)          (GET /en/Barack_Obama?oldid=1234)           (requests for page content and template expansions)
Client browser ------------------------------------------> API ---------------------------->  Parsoid -----------------------------------------------------> API
    ^                                                      | ^                                 |   ^                                                          |
    |                  (response)                          | |      (HTML)                     |   |                   (responses)                            |
    +------------------------------------------------------+ +---------------------------------+   +----------------------------------------------------------+


                (POST /w/api.php?action=ve-parsoid)          (POST /en/Barack_Obama; oldid=1234)
Client browser ------------------------------------------> API ---------------------------->  Parsoid
                                                           | ^                                 |
                                               (save page) | |      (wikitext)                 |
                                                           | +---------------------------------+
                                                           |
                                                        Database