You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Parsoid: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>C. Scott Ananian
imported>Arlolra
(38 intermediate revisions by 11 users not shown)
Line 1: Line 1:
[[mw:Parsoid|Parsoid]] is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext.
[[mw:Parsoid|Parsoid]] is a service that converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. Parsoid operates as a stateless HTTP server running on port 8000.
* VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.
 
== Uses ==
 
* [[mw:Extension:VisualEditor|VisualEditor]] fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext.  
* Flow (as configured on WMF wikis with <code>$wgFlowContentFormat = 'html'</code>) works the other way around. When a user creates a post Flow uses Parsoid to convert the wikitext to HTML and Flow stores the HTML in [[ExternalStore]]. If someone later edits a post Flow uses Parsoid to convert the HTML back to wikitext for editing.
* Flow (as configured on WMF wikis with <code>$wgFlowContentFormat = 'html'</code>) works the other way around. When a user creates a post Flow uses Parsoid to convert the wikitext to HTML and Flow stores the HTML in [[ExternalStore]]. If someone later edits a post Flow uses Parsoid to convert the HTML back to wikitext for editing.


Line 9: Line 12:
* pybal does health checks on all backends every second, and depools boxes that are down as long as the % of depooled boxes does not exceed 50%. To see these health checks and depools/repools happen in real time, run <code>ssh parsoid.svc.eqiad.wmnet</code> (this will drop you into either lvs1003 or lvs1006, depending on which is active), then <code>tail -f /var/log/pybal.log | grep parsoid</code>
* pybal does health checks on all backends every second, and depools boxes that are down as long as the % of depooled boxes does not exceed 50%. To see these health checks and depools/repools happen in real time, run <code>ssh parsoid.svc.eqiad.wmnet</code> (this will drop you into either lvs1003 or lvs1006, depending on which is active), then <code>tail -f /var/log/pybal.log | grep parsoid</code>
* Logging happens in /var/log/parsoid/parsoid.log. There is a log rotation setup in /etc/logrotate.d/parsoid.
* Logging happens in /var/log/parsoid/parsoid.log. There is a log rotation setup in /etc/logrotate.d/parsoid.
* Logs in logstash: https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid
* Logs in logstash (Parsoid/PHP): https://logstash.wikimedia.org/app/dashboards#/view/AW4Y6bumP44edBvO7lRc
* Logging in Parsoid starts at "warn" level (see https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L5459)
*[[mw:Parsoid#Links_for_Parsoid_deployers_.28to_the_Wikimedia_cluster.29|Useful links for Parsoid deployers]]
*[[mw:Parsoid#Links_for_Parsoid_deployers_.28to_the_Wikimedia_cluster.29|Useful links for Parsoid deployers]]
* Currently running Parsoid version:
** In beta: https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version#mw-version-library-wikimedia/parsoid
** In production: https://en.wikipedia.org/wiki/Special:Version#mw-version-library-wikimedia/parsoid
** On scandium: See https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing
** (Be aware that the parsoid cluster is behind restbase and although the cluster *should* be running the same version of Parsoid as the mediawiki frontends, if puppet or scap are broken (esp in beta) things could diverge.)
== Machine overview ==
These are the machines involved in a Parsoid deploy:
* In the beta/wmflabs cluster:
** <code>deployment-deploy01.deployment-prep.eqiad.wmflabs</code>: staging host in beta; no longer used.
** <code>deployment-parsoid11.deployment-prep.eqiad.wmflabs</code>: parsoid server in beta
** <code>deployment-restbase02.deployment-prep.eqiad.wmflabs</code>: restbase server in beta
* In the production cluster:
** <code>deployment.eqiad.wmnet</code>: staging host in production; no longer used
** <code>wtp1xxx</code>: parsoid servers in eqiad cluster
** <code>restbase1xxx</code>: restbase servers in eqiad cluster
** <code>parse2xxx</code>: parsoid servers in codfw cluster
** <code>restbase2xxx</code>: restbase servers in codfw cluster
** <code>scandium.eqiad.wmnet</code>: Parsoid testing host, has read-only access to the production database.


== Deploying changes ==
== Deploying changes ==
Parsoid is deployed using scap3. Doing deployments with scap3 is very easy. You just run <code>scap deploy</code>, which pushes the new state to all backends and restarts them. You should have [[How_to_deploy_code#Deployment_requirements|deploy access]] and be a member of the [https://gerrit.wikimedia.org/r/#/c/304471/ deploy-service group] puppet group.
Parsoid is deployed as part of the MediaWiki train. See [[How to deploy code]] for an overview, [[Heterogeneous deployment]] for a more technical description of the directory structures involved, and [[Heterogeneous deployment/Train deploys]] for the steps to do a train deploy. When code changes outside the [[Deployments/Train|train schedule]] are required, a [[Backport windows]] will be required.  Generally Parsing team members won't be doing train deploys or Backport deploys directly; we will tag a Parsoid version (which releases it to [https://packagist.org/packages/wikimedia/parsoid packagist] to make it available via composer) and merge a version bump into the [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/vendor mediawiki/vendor] repository.  Once the patch is merged into <code>vendor</code>, the new version of Parsoid goes live in beta (almost) immediately; it will then be rolled out to production on the next train.


=== Pre-deploy checks ===
=== Deploying Parsoid ===


==== Prepare the deploy patch ====
==== Test the version you hope to deploy ====
* Check http://parsoid-rt-tests.wikimedia.org/regressions/between/{from}/{to} where {from} is the last deployed hash from [[mw:Parsoid/Deployments]] and {to} is the latest tested commit (which we're about to deploy)
* See [[mw:Parsoid/Round-trip testing]] for details.
** http://parsoid-rt-tests.wikimedia.org/commits gives you a nice radio-button interface to create this URL
* Check <nowiki>http://parsoid-rt-tests.wikimedia.org/regressions/between/{from}/{to}</nowiki> where {from} is the last deployed hash from [[mw:Parsoid/Deployments]] and {to} is the latest tested commit (which we're about to deploy)
** <nowiki>http://parsoid-rt-tests.wikimedia.org/commits</nowiki> gives you a nice radio-button interface to create this URL
** '''BEWARE''': if you get the output <code>total regressions between selected revisions: 0</code>, it is extremely likely that you '''mistyped the hash''' or that we didn't actually run round-trip tests for that particular hash.  (This is a bug, we should probably give a better message in this case.)
** '''BEWARE''': if you get the output <code>total regressions between selected revisions: 0</code>, it is extremely likely that you '''mistyped the hash''' or that we didn't actually run round-trip tests for that particular hash.  (This is a bug, we should probably give a better message in this case.)
** Since we are using current revision of titles in round-trip testing, edits to pages can show up as false regressions. tools/regression-testing.js is useful in filtering those out. Grab the wiki:title pairs that show up as regressions in the regressions url, save it to a file and feed it to the regressions tool and you will get a list of pages to look more closely, if necessary.
** Since we are using current revision of titles in round-trip testing, edits to pages can show up as false regressions. tools/regression-testing.php in the Parsoid repo is useful in filtering those out. Running it with the right parameters (use --help for usage) will get a list of pages to look more closely, if necessary.
* Create a short deployment summary on [[mw:Parsoid/Deployments]] from <code>git log --cherry-pick {from}...{to}</code>. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, parser test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
* Check that there are no concerning notices or errors in logstash from the rt run
* Prepare a [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/parsoid/deploy deploy repo] commit and push for +2
**[https://logstash.wikimedia.org/app/dashboards#/view/parsoid-tests?_g=h@865c245&_a=h@bf5a0ea https://logstash.wikimedia.org/app/dashboards#/view/parsoid-tests]
** Roughly: <code><nowiki>cd deploy ; git checkout master ; git pull origin master ; git submodule update ; cd src ; git checkout {to} ; cd .. ; git add -u ; git commit -m "Bump src to {to} for deploy" ; git review</nowiki></code>


==== Verify deployment version on beta after the deploy patch is merged ====
==== Prepare the vendor patch ====
* Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to <code>deployment-deploy01.deployment-prep.eqiad.wmflabs</code>)
Here is a concise summary of steps in the common case. Detailed explanation follows.<syntaxhighlight lang="bash" line="1">
* If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
cd PARSOID_REPO
* On beta cluster, perform manual VisualEditor editing tests. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check parsoid logs, if necessary.
git checkout <git-sha-of-patch-to-tag>
==== Be around on IRC ====
git tag v0.{version}.0-a{N}
* Add yourself to the "deployer" field of [[Deployments]] if you're not already there
git push origin v0.{version}.0-a{N}
* Be online in freenode #wikimedia-operations (and stay online through the deployment window)
cd VENDOR_REPO
.. edit composer.json and bump version number of wikimedia/parsoid as above ..
composer update --no-dev
.. ensure all files are added and git commit (see below for what to include in commit message) ..
git review -u
.. add reviewers and get it reviewed ..
.. post-merge, verify it landed on the beta cluster and works fine ..
</syntaxhighlight>


=== Deploying the latest version of Parsoid ===
===== Details =====
Before you begin, note that Parsoid caches its git version stringSo you may wish to do:
''(This process was hashed out in [[phab:T240055]])''
<source lang="bash">
* Pull the latest version of master into your master branch of Parsoid and do remote update thereafter
ssh deployment.eqiad.wmnet
*Tag a new version of Parsoid and push the tag: (hint use: git tag -l  to show existing tags)
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
**<code>git tag v0.16.0-a{N}</code>
  curl "http://$wtp:8000/_version"; echo; done;
**<code>git push origin v0.16.0-a{N}</code> (Include the leading 'v', and substitute the next version number for ''{N}''.)
</source>
*** The "origin" remote here is <code><nowiki>ssh://USER@gerrit.wikimedia.org:29418/mediawiki/services/parsoid</nowiki></code>
to ensure that the "old" version string is cached, so that you will be able to tell when parsoid restarts with its "new" version below.
*** Nothing more than usual <code>push</code> permissions on the parsoid repo should be neededIf you need to tweak permissions (for example, to temporarily add <code>force push</code> permissions to fix a mistake), you can do this using the "edit" button at https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/parsoid,access
** Check that this version has been picked up at https://packagist.org/packages/wikimedia/parsoid (might take a minute, you can work on the deployment summary while you wait)
* Create a short deployment summary on [[mw:Parsoid/Deployments]].
**In Parsoid repository, <code>tools/gen_deploy_log.sh v0.16.0-a{from} v0.16.0-a{to}</code> (for appropriate values of ''{from}'' and ''{to}'') will generate wikitext you can cut-and-paste into [[mw:Parsoid/Deployments]] (improvements to this script are welcome!)
**In [[mw:Parsoid/Deployments]], copy previous release header line, edit the dates and version info and delete "done" template and insert "In progress" template
**The manual way is/was to start from <code>git log --cherry-pick {from}...{to}</code>. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, parser test updates, etc). (The above command will do the right thing if ''{from}'' was on a branch and had patches cherry-picked from ''{to}'', although if there were conflicts during the cherry-pick to ''{from}'' the patch will still appear in the log for ''{to}''.)
* Checkout [[gerrit:admin/repos/mediawiki/vendor|<code>mediawiki/vendor.git</code>]] '''master''' branch into its own working directory. (hint: <code><nowiki>$ git clone "https://gerrit.wikimedia.org/r/mediawiki/vendor"</nowiki></code>)
**Make a new branch in that repo: (hint: <code>git branch deploy; get checkout deploy</code>)
**In that repo: Update <code>composer.json</code> to include <code>"wikimedia/parsoid": "0.16.0-aN",</code> (for your version ''{N}''; note no leading "v")
**Ensure you're running the version of composer listed [https://github.com/wikimedia/mediawiki-vendor#adding-or-updating-libraries in the README for the vendor repo].  At time of writing this is <code>2.2.4</code>. <code>composer --version</code> will tell you what version you're running and (usually) <code>composer self-update</code> will bring you up-to-date.
**'''Ensure that you are using the latest version of composer''' (using <code>composer self-update</code>).  Informally, you need to be using "the same version JamesF is using."  If you use an old composer, you will create unrelated diffs to non-parsoid code when you do the next step.
**Do <code>composer update --no-dev</code> (which should only update parsoid)
***If composer complains "<code>The requested package wikimedia/parsoid 0.16.0-aN exists as [...long list not including 0.16.0-aN...]</code>" then composer's local cache hasn't been updated to include the new version available from [https://packagist.org/packages/wikimedia/parsoid packagist.org] yet.  Wait [https://blog.packagist.com/deprecating-composer-1-support/ 15 minutes] and try again.  The <code>--no-cache</code> option to composer *might* help... but it might not (it probably won't).  Apparently composer 2.x sped this up? :)
**Add the changed files to git, commit and provide a detailed commit message as described below, and then upload to gerrit:
<syntaxhighlight lang="bash">
git add wikimedia/parsoid composer.lock composer.json composer # & etc, if needed
git commit
git review
</syntaxhighlight>
*Use a commit message that (1) names the new parsoid tag, (2) includes the git hash of the new parsoid version ''(we've stopped including this for the most part because the hash is given by the parsoid tag in part 1)'', and (3) references key bug #s from the deployment summary so the deploy gets linked to phab ( Tip: <code>git log v0.16.0-a$PREV..v0.16.0-a$NEW | grep Bug: | sort -u</code>). For example:
<pre>
Bump parsoid to 0.16.0-aN


Now to do the deploy:<source lang="bash">
This corresponds to Parsoid commit cafecafecafecafe.
ssh deployment.eqiad.wmnet
tin$ cd /srv/deployment/parsoid/deploy
tin$ git pull
tin$ git submodule update --init
tin$ scap deploy 'Updating Parsoid to <new hash> (T<bug number>, T<bug number>)'
</source>


The argument in the <code>scap</code> command will appear in the [[Server Admin Log]] as scap runsList the hash of the deployed Parsoid version as well as any bug numbers referenced in the [[mw:Parsoid/Deployments|deploy log]]. This will create cross-references in the listed bugs to the SAL.
Bug: T111111
Bug: T222222
</pre>
* Review the generated patch (either via <code>git show</code> or on gerrit), looking specifically for unexpected changes.  The code in <code>wikimedia/parsoid</code> should change in roughly the ways you expect from the deploy summary, there should be a change to the version number in <code>composer.json</code> and changes to some hashes, timestamps, and versions in <code>composer.lock</code> and <code>composer/installed.json</code>, but '''there should be no other changes'''. See [[gerrit:c/mediawiki/vendor/+/628944/2/composer/ClassLoader.php|this patch set]] for an example where an old version of composer was used, resulting in spurious changes to other files in <code>composer/</code>.
*If jenkins fails on gerrit with the same "<code>The requested package wikimedia/parsoid 0.16.0-aN exists as...</code>" message described above, the reason is the as that described for the <code>composer update --no-dev</code> step above: composer's cache on jenkins still doesn't have your new version yetWait a minute and comment "recheck" to re-run the jenkins tests.
*Review and C+2 on gerrit.  This will go live on beta cluster pretty quickly (within 30 minutes).
*If you were late and ''just missed'' the train branch, be sure to check the [[Parsoid#If the train branch has already been cut|"If the train branch has already been cut"]] section below.
<!-- This shouldn't be necessary any more...
* '''TEMPORARILY''' you will also need to ssh to <code>deployment-parsoid11.deployment-prep.eqiad.wmflabs</code> and do a scap pull due to [[phab:T247545]] -->


First, the canary group is deployed to (wtp{{1025,1026,2001,2002}). Watch the canary machines on grafana for [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=parsoid&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=parsoid&var-instance=wtp1025&var-instance=wtp1026 eqiad] and [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=parsoid&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=parsoid&var-instance=wtp2001&var-instance=wtp2002 codfw] for a while to ensure there aren't any sudden load changes, and check the logs in [https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid logstash] to ensure they are not being spammed with some new error.
====Verify deployment version on beta after the vendor patch is merged====
If everything is OK, Scap will ask you whether you want to continue to the next group or do all groups. It is safe to use <kbd>c</kbd> to tell it to do all groups without asking.
*Check that this is live on the mediawiki front ends in beta by watching the version number listed on https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version#mw-version-library-wikimedia/parsoid
 
*If ever you need to, you can check the version on <code>parsoid11</code> as well:
Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like
<syntaxhighlight lang="bash">
$ ssh deployment-parsoid11.deployment-prep.eqiad.wmflabs
user@deployment-parsoid11$ curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version' | fgrep wikimedia/parsoid -C0
</syntaxhighlight>
*If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
* On beta cluster (eg <code>[https://en.wikipedia.beta.wmflabs.org en.wikipedia.beta.wmflabs.org]</code>), perform [[#Post-deploy_checks|manual VisualEditor editing tests]]. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check [https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid parsoid logs].  Be particularly alert to integration issues: library conflicts, etc.
*Watch the logs on beta: https://wikitech.wikimedia.org/wiki/Logstash#Beta_Cluster_Logstash


!log Attempted Parsoid deploy, but rolled back to version <new hash> (T<bug number>)
====Be around on IRC====
*Add yourself to the "deployer" field of [[Deployments]] if you're not already there
*Be online in the libera.chat IRC channel {{irc|wikimedia-operations}} (and stay online through the deployment window)


As with the argument to <code>scap</code>, mentioning bug numbers will create cross-references to the SAL.
====Logs to monitor====
*[https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=parsoid Parsoid Cluster in eqiad]
*[https://logstash.wikimedia.org/app/kibana#/dashboard/AW4Y6bumP44edBvO7lRc Logstash]
*[https://grafana.wikimedia.org/d/000000048/parsoid-timing-wt2html wt2html perf]
*[https://grafana.wikimedia.org/dashboard/db/parsoid-timing-html2wt html2wt perf]


=== Post-deploy checks ===
=== Post-deploy checks===
* Test VE editing on enwiki and non-latin wikis
*Test VE editing on enwiki and non-latin wikis
** For example, open [[:it:Luna]] (or other complex page), start the visual editor, make some random vandalism, click save -> review changes, then verify that the wikitext reflects your changes and was not corrupted. Hit cancel to abort the edit.
**For example, open [[:it:Luna]] (or other complex page), start the visual editor, make some random vandalism, click save -> review changes, then verify that the wikitext reflects your changes and was not corrupted. Hit cancel to abort the edit.
** Reading through the recent edits ([https://fr.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges frwiki], [https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges enwiki]) can also be a good check.
**Reading through the recent edits ([https://fr.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges frwiki], [https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges enwiki]) can also be a good check.
* Verify all Parsoid servers are running the same version with:<syntaxhighlight lang="bash">
ssh deployment.eqiad.wmnet
tin$ for wtp in `grep wtp /etc/dsh/group/parsoid`; do echo -n "Querying $wtp: "; \
  curl "http://$wtp:8000/_version"; echo; done;
</syntaxhighlight>


=== Testing a version bump ===
===Testing a version bump===
If the deployed version of Parsoid updates the Parsoid DOM version and/or will exercises the html2html "down convert" endpoint, the following test procedure will ensure that clients are getting the appropriate DOM version:
If the deployed version of Parsoid updates the Parsoid DOM version and/or will exercises the html2html "down convert" endpoint, the following test procedure will ensure that clients are getting the appropriate DOM version:
* First and foremost, mocha tests should already be present that cover both downgrading the HTML and serializing it with and without selser.
*First and foremost, mocha tests should already be present that cover both downgrading the HTML and serializing it with and without selser.
*Create a test page on the beta cluster containing the features that merited the major version bump.
*Create a test page on the beta cluster containing the features that merited the major version bump.
*Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via <code>deployment-parsoid09.deployment-prep.eqiad.wmflabs</code>) accepting the various specs that are available.  The inline meta tag and aforementioned features should indicate that it worked.  Example requests might be,
*Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via <code>deployment-parsoid11.deployment-prep.eqiad.wmflabs</code>) accepting the various specs that are available.  The inline meta tag and aforementioned features should indicate that it worked.  Example requests might be,
**For the old version, <code>curl 'localhost:8000/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="<nowiki>https://www.mediawiki.org/wiki/Specs/HTML/1.7.0</nowiki>"'</code>
**For the old version, <code>curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="<nowiki>https://www.mediawiki.org/wiki/Specs/HTML/1.7.0</nowiki>"'</code>
**For the new version, <code>curl 'localhost:8000/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="<nowiki>https://www.mediawiki.org/wiki/Specs/HTML/2.0</nowiki><nowiki/>.0"'</code>
**For the new version, <code>curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="<nowiki>https://www.mediawiki.org/wiki/Specs/HTML/2.0</nowiki><nowiki/>.0"'</code>  
*Confirm that VE on the beta cluster is still tied to the older content version and will be needing a downgrade (see the commit in [[Special:Version]] for the extension and compare with the header defined in <code>includes/ApiVisualEditor.php</code>)
*Confirm that VE on the beta cluster is still tied to the older content version and will be needing a downgrade (see the commit in [[Special:Version]] for the extension and compare with the header defined in <code>includes/ApiVisualEditor.php</code>)
*At this point, two scenarios need to be tested: an edit starting from the older content version stored in RESTBase (which won't require a downgrade) and one starting from the new content version, which will.
*At this point, two scenarios need to be tested: an edit starting from the older content version stored in RESTBase (which won't require a downgrade) and one starting from the new content version, which will.
Line 85: Line 145:
*In each case, try to confirm that the features can be edited directly as well as being ignored by selser (usually because no normalizations occur).  Unfortunately, testing here is a bit more art than science.
*In each case, try to confirm that the features can be edited directly as well as being ignored by selser (usually because no normalizations occur).  Unfortunately, testing here is a bit more art than science.
*Finally, open up the various testing dashboards for logging and metrics to verify that no unexpected errors are present and that the downgrades are accounted for.
*Finally, open up the various testing dashboards for logging and metrics to verify that no unexpected errors are present and that the downgrades are accounted for.
=== Testing LanguageConverter ===
 
===Testing on scandium ===
When on scandium, use this command to test Parsoid directly:
<syntaxhighlight lang="bash">
curl -x scandium.eqiad.wmnet:80 http://<domain>/w/rest.php/<domain>/v3/page/html/<title>/<revid>
</syntaxhighlight>
 
=== Testing LanguageConverter===
LanguageConverter can be tested on beta in a manner similar to testing a version bump.
LanguageConverter can be tested on beta in a manner similar to testing a version bump.
*Create a test page on the beta cluster containing the language converter features you wish to touch.  Either the page language for the article must be set to a language w/ variants, or else the article must take place on a wiki where the main language has variants.  We'll use the [https://sr.wikipedia.beta.wmflabs.org/wiki/%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA:Cscott/SrTest SrTest] page on beta srwiki in our examples below.
*Create a test page on the beta cluster containing the language converter features you wish to touch.  Either the page language for the article must be set to a language w/ variants, or else the article must take place on a wiki where the main language has variants.  We'll use the [https://sr.wikipedia.beta.wmflabs.org/wiki/%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA:Cscott/SrTest SrTest] page on beta srwiki in our examples below.
*Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via ssh to <code>deployment-parsoid09.deployment-prep.eqiad.wmflabs</code>) specifying the desired variant language.  Verify that the result has been converted appropriately.  Example requests might be,
*Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via ssh to <code>deployment-parsoid11.deployment-prep.eqiad.wmflabs</code>) specifying the desired variant language.  Verify that the result has been converted appropriately.  Example requests might be,  
**<code>curl -H'Accept-Language: sr-ec' 'localhost:8000/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23'</code>
 
**<code>curl -H'Accept-Language: sr-el' 'localhost:8000/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23'</code>
**<code>curl -H'Accept-Language: sr-ec' -x deployment-parsoid11:80 http://sr.wikipedia.beta.wmflabs.org/w/rest.php/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23</code>
* To test in production, try something like:
**<code>curl -H'Accept-Language: sr-el' -x deployment-parsoid11:80 http://sr.wikipedia.beta.wmflabs.org/w/rest.php/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23</code>
**<code>curl -X GET --header 'Accept-Language: sr-el' 'https://sr.wikipedia.org/api/rest_v1/page/html/%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0/21280369'</code>
*To test in production, try something like:
**<code>curl -X GET --header 'Accept-Language: sr-el' 'https://sr.wikipedia.org/api/rest_v1/page/html/%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0/21280369'</code>  
 
See https://phabricator.wikimedia.org/T241146#5810424 for some more examples.


=== Deploying a cherry-picked patch ===
===Deploying a cherry-picked patch===
One way to do this is to create a new branch in the Parsoid repo and cherry-pick your patches to that.  For example:
One way to do this is to create a new branch in the Parsoid repo and cherry-pick your patches to that.  For example:
<pre>
<pre>
git checkout 497da30e # this is the commit on the master branch that you want to cherry pick on top of
git checkout v0.13.0-a3 # this is the commit on the master branch that you want to cherry pick on top of
git checkout -b deploy-20150528 # give it a name (go ahead and use the date of your deploy)
git checkout -b deploy-20150528 # give it a name (go ahead and use the date of your deploy)
git cherry-pick f274c3f54f385a6ac159a47209d279b9040a161c # patch number 1
git cherry-pick f274c3f54f385a6ac159a47209d279b9040a161c # patch number 1
Line 103: Line 173:
git push gerrit deploy-20150528:deploy-20150528 # create the branch in gerrit (DON'T USE SLASHES HERE)
git push gerrit deploy-20150528:deploy-20150528 # create the branch in gerrit (DON'T USE SLASHES HERE)
</pre>
</pre>
Now do the usual steps to prepare a deploy repo (see below) using the hash of your branch commit (73445bfd in the example below):
Now do the usual steps to tag a release and prepare a vendor branch patch (see above) using the next available release version number (v0.13.0-a4 in the example below):
<pre>
git tag v0.13.0-a4 # this is the next available release number
git push origin v0.13.0-a4
</pre>
Switch to the <code>mediawiki/vendor</code> repository:
<pre>
<pre>
cd deploy
git checkout master ; git pull origin master
git checkout master ; git pull origin master ; git submodule update ; cd src ; git checkout 73445bfddded9f0baa6afe548c98880f4401fb7b # your branch commit
edit composer.json # set wikimedia/parsoid to v0.13.0-a4
cd .. ; git add -u ; git commit -m "Bump src to 73445bfd (deploy-20150528 branch) for deploy"
composer update --no-dev
git add -u
git commit -m "Bump wikimedia/parsoid to v0.13.0-a4"
git review -u
git review -u
</pre>
</pre>
Line 113: Line 190:
Note that the automated push to beta will fail if your gerrit branch name contains a slash.  This is probably just because some ancient version of git is being used, and will eventually be fixed.  But in the meantime, use dashes instead of slashes.
Note that the automated push to beta will fail if your gerrit branch name contains a slash.  This is probably just because some ancient version of git is being used, and will eventually be fixed.  But in the meantime, use dashes instead of slashes.


=== Cherry-picking directly from tin and deploying it ===
When this is merged into mediawiki-vendor it will (shortly) go live on beta; you should verify that everything looks good there. See [[#Verify deployment version on beta after the vendor patch is merged]].  If you want this cherry-pick to shortcut the train (instead of waiting to ride the next one) keep going into the next section, [[Parsoid#If the train branch has already been cut|"If the train branch has already been cut"]].
In many situations, a hotfix might need to be pushed quickly. One way to do that is to cherry-pick the patch on tin (aka deployment.eqiad.wmnet) and sync it.
<pre>
### Verify that you have the most recently deployed code that you want to cherry-pick on top of
tin$ cd /srv/deployment/parsoid/deploy (verify via git log)
tin$ cd src (verify via git log)
 
### Create a hotfix branch
tin$ git checkout -b hotfix_<some_unique_tag>


### Get latest code from master you want to cherry-pick from
== Edge case deployment scenarios ==
tin$ git checkout master; git pull


### Check out the hotfix branch and cherry-pick
====If the train branch has already been cut====
tin$ git checkout hotfix_<some_unique_tag>
'''IF THE TRAIN BRANCH HAS ALREADY BEEN CUT''' (aka the <code>wmf/1.XX.0-wmf.YY</code> branch exists) then <u>after you merge to master</u> of <code>mediawiki-vendor</code> you will also need to cherry-pick a patch to the appropriate <u>branch</u> of <code>mediawiki-vendor</code>, for example <code>wmf/1.36.0-wmf.21</code>.  In some cases you can use gerrit to cherry-pick the vendor branch to the branch, but in practice most updates to vendor conflict with each other due to the presence of content hashes, so you'll most likely need to repeat the steps above:<pre>
tin$ git cherry-pick <commit-from-master>
# from mediawiki/vendor
 
git remote update # if needed
### Create a deploy-repo patch
git checkout wmf/1.36.0-wmf.3
tin$ cd ..; git commit -a -m "Bump src to whatever-git-sha-it-is for hotfix"
edit composer.json # set wikimedia/parsoid to v0.13.0-a21
 
composer update --no-dev
### The usual deployment steps
git add -u
tin$ scap deploy
git commit -m "Bump wikimedia/parsoid to v0.13.0-a21"
... verify deployment ...
git review -u
</pre>
</pre>


=== Restarting ===
Now, before you merge this cherry-pick onto the branch, you need to check one of three possible cases:
In case you need to restart Parsoid without any deployments (for example, to reload mediawiki configs from config or other deployments),


* Restart parsoid hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
#If the train branch is new and the "branch commit" has not yet been merged ([[gerrit:c/mediawiki/core/+/646887|it looks like this]]; here is a [https://gerrit.wikimedia.org/r/q/project:mediawiki/core+owner:mmodell%252Btrainbranchbot%2540wikimedia.org gerrit search]) -- '''wait! Do not merge the cherry-pick''' into mediawiki-vendor '''until the branch commit has landed''', or the git submodules in mediawiki-core will be left out of sync ([[phab:T259832|T259832]]). You might want to add a <code>Depends-On</code> clause to the cherry-pick patch to enforce this. If you accidentally merged this, see below for how to fix it.
*:<code>cd /srv/deployment/parsoid/deploy && scap deploy --service-restart</code>
#If the [https://gerrit.wikimedia.org/r/q/project:mediawiki/core+owner:mmodell%252Btrainbranchbot%2540wikimedia.org branch commit] has been merged, but the train has not been deployed anywhere (check [[Deployments#!/deploycal/current|Deployments]] and the status page on [https://versions.toolforge.org/ versions.toolforge.org]), then it's safe to just C+2 the cherry-pick. '''But be sure to ping {{irc|wikimedia-operations}} and get clearance before C+2 and merge,''' since (a) the deployer may have already checked out the branch in preparation for the train, and (b) since jenkins can take a while to complete the merge and they need to know to wait for it.  Probably worth leaving a comment on the phab task for the blocker bug for the train release as well.
#If the train has already been deployed, then you will need to [[Backport_windows|backport]] this cherry-pick; it is considered bad form to leave code committed on the branch which isn't deployed.  Don't merge the cherry-pick until the backport window.


== When something goes wrong ==
====If you accidentally merged into vendor before the branch commit has been merged====
=== Reverting a Parsoid deployment ===
<span id="T259832">Merging a patch onto a branch in the <code>mediawiki-vendor</code> repository will automatically update the git submodules in core, but only after the branch commit is in place.  See [[phab:T259832]] for details.</span>  If you think you might have merged onto vendor before the branch commit was merged, check the appropriate vendor branch history for core, aka https://gerrit.wikimedia.org/g/mediawiki/core/+/refs/heads/wmf/1.36.0-wmf.3.  Verify that the submodule hash for vendor corresponds to the tip of the branch of mediawiki-vendor.  If it's not correct, <u>after the branch commit has been merged</u> into mediawiki-core you need to manually bump the submodules:<syntaxhighlight lang="bash">
Code
cd .../mediawiki-core
<pre>
# note that the below will clobber your vendor, extensions, and skins directories
ssh deployment.eqiad.wmnet
# you might want to use a new clean checkout of core
cd /srv/deployment/parsoid/deploy
git checkout wmf/1.36.0-wmf.3
scap deploy --rev <sha></pre>
git submodule update --init
git submodule update --remote vendor
git add vendor
git commit -m "Update git submodules"
git review -u
</syntaxhighlight>Review and merge that.


=== Misc stuff ===
== Misc stuff ==
* To deploy to a single host
*To deploy to a single host  
*:<code>scap deploy --force -l <node></code>
*:<code>scap deploy --force -l <node></code>
* To see which hosts are pooled, from another host
* To see which hosts are pooled, from another host
*:<code>confctl select dc=.*,cluster=parsoid,service=parsoid get</code>
*:<code>confctl select dc=.*,cluster=parsoid,service=parsoid get</code>
* To see the list of parsoid hosts in beta:
*To see the list of parsoid hosts in beta:
*:<code>cat /srv/deployment/parsoid/deploy/scap/betacluster</code>
*:<code>cat /srv/deployment/parsoid/deploy/scap/betacluster</code>
** See also <code>/srv/deployment/parsoid/deploy/scap/scap.cfg</code> in general
**See also <code>/srv/deployment/parsoid/deploy/scap/scap.cfg</code> in general
* To change which hosts are pooled, from deployment.eqiad.wmnet
*To pool/depool a node, from deployment.eqiad.wmnet, run:
*:<code>SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'</code>
**'''To depool''': <code>SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'depool service=parsoid'</code>
** '''To pool'''    : <code>SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'</code>


== Data flow ==
==Data flow==
Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in <code>extensions/VisualEditor/ApiVisualEditor.php</code> and is invoked with a POST request to <code>/w/api.php?action=ve-parsoid</code>. The API module then sends a request to Parsoid, either <code>GET /$prefix/$pagename</code> to get the HTML for a page, or <code>POST /$prefix/$pagename</code> to submit HTML and get wikitext back. Parsoid itself also issues requests to <code>/w/api.php</code> to get the wikitext of the requested page and to do template expansion.
Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in <code>extensions/VisualEditor/ApiVisualEditor.php</code> and is invoked with a POST request to <code>/w/api.php?action=ve-parsoid</code>. The API module then sends a request to Parsoid, either <code>GET /$prefix/$pagename</code> to get the HTML for a page, or <code>POST /$prefix/$pagename</code> to submit HTML and get wikitext back. Parsoid itself also issues requests to <code>/w/api.php</code> to get the wikitext of the requested page and to do template expansion.



Revision as of 18:38, 4 April 2022

Parsoid is a service that converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. Parsoid operates as a stateless HTTP server running on port 8000.

Uses

  • VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext.
  • Flow (as configured on WMF wikis with $wgFlowContentFormat = 'html') works the other way around. When a user creates a post Flow uses Parsoid to convert the wikitext to HTML and Flow stores the HTML in ExternalStore. If someone later edits a post Flow uses Parsoid to convert the HTML back to wikitext for editing.

Monitoring

Machine overview

These are the machines involved in a Parsoid deploy:

  • In the beta/wmflabs cluster:
    • deployment-deploy01.deployment-prep.eqiad.wmflabs: staging host in beta; no longer used.
    • deployment-parsoid11.deployment-prep.eqiad.wmflabs: parsoid server in beta
    • deployment-restbase02.deployment-prep.eqiad.wmflabs: restbase server in beta
  • In the production cluster:
    • deployment.eqiad.wmnet: staging host in production; no longer used
    • wtp1xxx: parsoid servers in eqiad cluster
    • restbase1xxx: restbase servers in eqiad cluster
    • parse2xxx: parsoid servers in codfw cluster
    • restbase2xxx: restbase servers in codfw cluster
    • scandium.eqiad.wmnet: Parsoid testing host, has read-only access to the production database.

Deploying changes

Parsoid is deployed as part of the MediaWiki train. See How to deploy code for an overview, Heterogeneous deployment for a more technical description of the directory structures involved, and Heterogeneous deployment/Train deploys for the steps to do a train deploy. When code changes outside the train schedule are required, a Backport windows will be required. Generally Parsing team members won't be doing train deploys or Backport deploys directly; we will tag a Parsoid version (which releases it to packagist to make it available via composer) and merge a version bump into the mediawiki/vendor repository. Once the patch is merged into vendor, the new version of Parsoid goes live in beta (almost) immediately; it will then be rolled out to production on the next train.

Deploying Parsoid

Test the version you hope to deploy

  • See mw:Parsoid/Round-trip testing for details.
  • Check http://parsoid-rt-tests.wikimedia.org/regressions/between/{from}/{to} where {from} is the last deployed hash from mw:Parsoid/Deployments and {to} is the latest tested commit (which we're about to deploy)
    • http://parsoid-rt-tests.wikimedia.org/commits gives you a nice radio-button interface to create this URL
    • BEWARE: if you get the output total regressions between selected revisions: 0, it is extremely likely that you mistyped the hash or that we didn't actually run round-trip tests for that particular hash. (This is a bug, we should probably give a better message in this case.)
    • Since we are using current revision of titles in round-trip testing, edits to pages can show up as false regressions. tools/regression-testing.php in the Parsoid repo is useful in filtering those out. Running it with the right parameters (use --help for usage) will get a list of pages to look more closely, if necessary.
  • Check that there are no concerning notices or errors in logstash from the rt run

Prepare the vendor patch

Here is a concise summary of steps in the common case. Detailed explanation follows.

cd PARSOID_REPO
git checkout <git-sha-of-patch-to-tag>
git tag v0.{version}.0-a{N}
git push origin v0.{version}.0-a{N}
cd VENDOR_REPO
.. edit composer.json and bump version number of wikimedia/parsoid as above ..
composer update --no-dev
.. ensure all files are added and git commit (see below for what to include in commit message) ..
git review -u
.. add reviewers and get it reviewed ..
.. post-merge, verify it landed on the beta cluster and works fine ..
Details

(This process was hashed out in phab:T240055)

  • Pull the latest version of master into your master branch of Parsoid and do remote update thereafter
  • Tag a new version of Parsoid and push the tag: (hint use: git tag -l to show existing tags)
    • git tag v0.16.0-a{N}
    • git push origin v0.16.0-a{N} (Include the leading 'v', and substitute the next version number for {N}.)
    • Check that this version has been picked up at https://packagist.org/packages/wikimedia/parsoid (might take a minute, you can work on the deployment summary while you wait)
  • Create a short deployment summary on mw:Parsoid/Deployments.
    • In Parsoid repository, tools/gen_deploy_log.sh v0.16.0-a{from} v0.16.0-a{to} (for appropriate values of {from} and {to}) will generate wikitext you can cut-and-paste into mw:Parsoid/Deployments (improvements to this script are welcome!)
    • In mw:Parsoid/Deployments, copy previous release header line, edit the dates and version info and delete "done" template and insert "In progress" template
    • The manual way is/was to start from git log --cherry-pick {from}...{to}. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, parser test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
  • Checkout mediawiki/vendor.git master branch into its own working directory. (hint: $ git clone "https://gerrit.wikimedia.org/r/mediawiki/vendor")
    • Make a new branch in that repo: (hint: git branch deploy; get checkout deploy)
    • In that repo: Update composer.json to include "wikimedia/parsoid": "0.16.0-aN", (for your version {N}; note no leading "v")
    • Ensure you're running the version of composer listed in the README for the vendor repo. At time of writing this is 2.2.4. composer --version will tell you what version you're running and (usually) composer self-update will bring you up-to-date.
    • Ensure that you are using the latest version of composer (using composer self-update). Informally, you need to be using "the same version JamesF is using." If you use an old composer, you will create unrelated diffs to non-parsoid code when you do the next step.
    • Do composer update --no-dev (which should only update parsoid)
      • If composer complains "The requested package wikimedia/parsoid 0.16.0-aN exists as [...long list not including 0.16.0-aN...]" then composer's local cache hasn't been updated to include the new version available from packagist.org yet. Wait 15 minutes and try again. The --no-cache option to composer *might* help... but it might not (it probably won't). Apparently composer 2.x sped this up? :)
    • Add the changed files to git, commit and provide a detailed commit message as described below, and then upload to gerrit:
git add wikimedia/parsoid composer.lock composer.json composer # & etc, if needed
git commit
git review
  • Use a commit message that (1) names the new parsoid tag, (2) includes the git hash of the new parsoid version (we've stopped including this for the most part because the hash is given by the parsoid tag in part 1), and (3) references key bug #s from the deployment summary so the deploy gets linked to phab ( Tip: git log v0.16.0-a$PREV..v0.16.0-a$NEW | grep Bug: | sort -u). For example:
Bump parsoid to 0.16.0-aN

This corresponds to Parsoid commit cafecafecafecafe.

Bug: T111111
Bug: T222222
  • Review the generated patch (either via git show or on gerrit), looking specifically for unexpected changes. The code in wikimedia/parsoid should change in roughly the ways you expect from the deploy summary, there should be a change to the version number in composer.json and changes to some hashes, timestamps, and versions in composer.lock and composer/installed.json, but there should be no other changes. See this patch set for an example where an old version of composer was used, resulting in spurious changes to other files in composer/.
  • If jenkins fails on gerrit with the same "The requested package wikimedia/parsoid 0.16.0-aN exists as..." message described above, the reason is the as that described for the composer update --no-dev step above: composer's cache on jenkins still doesn't have your new version yet. Wait a minute and comment "recheck" to re-run the jenkins tests.
  • Review and C+2 on gerrit. This will go live on beta cluster pretty quickly (within 30 minutes).
  • If you were late and just missed the train branch, be sure to check the "If the train branch has already been cut" section below.

Verify deployment version on beta after the vendor patch is merged

$ ssh deployment-parsoid11.deployment-prep.eqiad.wmflabs
user@deployment-parsoid11$ curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version' | fgrep wikimedia/parsoid -C0

Be around on IRC

  • Add yourself to the "deployer" field of Deployments if you're not already there
  • Be online in the libera.chat IRC channel #wikimedia-operations connect (and stay online through the deployment window)

Logs to monitor

Post-deploy checks

  • Test VE editing on enwiki and non-latin wikis
    • For example, open it:Luna (or other complex page), start the visual editor, make some random vandalism, click save -> review changes, then verify that the wikitext reflects your changes and was not corrupted. Hit cancel to abort the edit.
    • Reading through the recent edits (frwiki, enwiki) can also be a good check.

Testing a version bump

If the deployed version of Parsoid updates the Parsoid DOM version and/or will exercises the html2html "down convert" endpoint, the following test procedure will ensure that clients are getting the appropriate DOM version:

  • First and foremost, mocha tests should already be present that cover both downgrading the HTML and serializing it with and without selser.
  • Create a test page on the beta cluster containing the features that merited the major version bump.
  • Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via deployment-parsoid11.deployment-prep.eqiad.wmflabs) accepting the various specs that are available. The inline meta tag and aforementioned features should indicate that it worked. Example requests might be,
  • Confirm that VE on the beta cluster is still tied to the older content version and will be needing a downgrade (see the commit in Special:Version for the extension and compare with the header defined in includes/ApiVisualEditor.php)
  • At this point, two scenarios need to be tested: an edit starting from the older content version stored in RESTBase (which won't require a downgrade) and one starting from the new content version, which will.
    • Note that, for extra points, there are potentially several versions numbers stored in RESTBase that satisfy the VE request based on caret semantics and it might be worthwhile to confirm that edits starting from those versions work as well.
    • Once you've found stored content in RESTBase with an appropriate version for your test it's prudent to confirm that VE is actually editing what you expect. This can be achieved by dumping the various DOMs: the original copy(ve.init.target.doc.body.outerHTML) and the edited copy(ve.init.target.docToSave.body.outerHTML)
  • In each case, try to confirm that the features can be edited directly as well as being ignored by selser (usually because no normalizations occur). Unfortunately, testing here is a bit more art than science.
  • Finally, open up the various testing dashboards for logging and metrics to verify that no unexpected errors are present and that the downgrades are accounted for.

Testing on scandium

When on scandium, use this command to test Parsoid directly:

curl -x scandium.eqiad.wmnet:80 http://<domain>/w/rest.php/<domain>/v3/page/html/<title>/<revid>

Testing LanguageConverter

LanguageConverter can be tested on beta in a manner similar to testing a version bump.

  • Create a test page on the beta cluster containing the language converter features you wish to touch. Either the page language for the article must be set to a language w/ variants, or else the article must take place on a wiki where the main language has variants. We'll use the SrTest page on beta srwiki in our examples below.
  • Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via ssh to deployment-parsoid11.deployment-prep.eqiad.wmflabs) specifying the desired variant language. Verify that the result has been converted appropriately. Example requests might be,

See https://phabricator.wikimedia.org/T241146#5810424 for some more examples.

Deploying a cherry-picked patch

One way to do this is to create a new branch in the Parsoid repo and cherry-pick your patches to that. For example:

git checkout v0.13.0-a3 # this is the commit on the master branch that you want to cherry pick on top of
git checkout -b deploy-20150528 # give it a name (go ahead and use the date of your deploy)
git cherry-pick f274c3f54f385a6ac159a47209d279b9040a161c # patch number 1
git cherry-pick de087b106be48fc6e97f2ebc4644f9d297ecdfed # patch number 2
git push gerrit deploy-20150528:deploy-20150528 # create the branch in gerrit (DON'T USE SLASHES HERE)

Now do the usual steps to tag a release and prepare a vendor branch patch (see above) using the next available release version number (v0.13.0-a4 in the example below):

git tag v0.13.0-a4 # this is the next available release number
git push origin v0.13.0-a4

Switch to the mediawiki/vendor repository:

git checkout master ; git pull origin master
edit composer.json # set wikimedia/parsoid to v0.13.0-a4
composer update --no-dev
git add -u
git commit -m "Bump wikimedia/parsoid to v0.13.0-a4"
git review -u

Note that the automated push to beta will fail if your gerrit branch name contains a slash. This is probably just because some ancient version of git is being used, and will eventually be fixed. But in the meantime, use dashes instead of slashes.

When this is merged into mediawiki-vendor it will (shortly) go live on beta; you should verify that everything looks good there. See #Verify deployment version on beta after the vendor patch is merged. If you want this cherry-pick to shortcut the train (instead of waiting to ride the next one) keep going into the next section, "If the train branch has already been cut".

Edge case deployment scenarios

If the train branch has already been cut

IF THE TRAIN BRANCH HAS ALREADY BEEN CUT (aka the wmf/1.XX.0-wmf.YY branch exists) then after you merge to master of mediawiki-vendor you will also need to cherry-pick a patch to the appropriate branch of mediawiki-vendor, for example wmf/1.36.0-wmf.21. In some cases you can use gerrit to cherry-pick the vendor branch to the branch, but in practice most updates to vendor conflict with each other due to the presence of content hashes, so you'll most likely need to repeat the steps above:

# from mediawiki/vendor
git remote update # if needed
git checkout wmf/1.36.0-wmf.3
edit composer.json # set wikimedia/parsoid to v0.13.0-a21
composer update --no-dev
git add -u
git commit -m "Bump wikimedia/parsoid to v0.13.0-a21"
git review -u

Now, before you merge this cherry-pick onto the branch, you need to check one of three possible cases:

  1. If the train branch is new and the "branch commit" has not yet been merged (it looks like this; here is a gerrit search) -- wait! Do not merge the cherry-pick into mediawiki-vendor until the branch commit has landed, or the git submodules in mediawiki-core will be left out of sync (T259832). You might want to add a Depends-On clause to the cherry-pick patch to enforce this. If you accidentally merged this, see below for how to fix it.
  2. If the branch commit has been merged, but the train has not been deployed anywhere (check Deployments and the status page on versions.toolforge.org), then it's safe to just C+2 the cherry-pick. But be sure to ping #wikimedia-operations connect and get clearance before C+2 and merge, since (a) the deployer may have already checked out the branch in preparation for the train, and (b) since jenkins can take a while to complete the merge and they need to know to wait for it. Probably worth leaving a comment on the phab task for the blocker bug for the train release as well.
  3. If the train has already been deployed, then you will need to backport this cherry-pick; it is considered bad form to leave code committed on the branch which isn't deployed. Don't merge the cherry-pick until the backport window.

If you accidentally merged into vendor before the branch commit has been merged

Merging a patch onto a branch in the mediawiki-vendor repository will automatically update the git submodules in core, but only after the branch commit is in place. See phab:T259832 for details. If you think you might have merged onto vendor before the branch commit was merged, check the appropriate vendor branch history for core, aka https://gerrit.wikimedia.org/g/mediawiki/core/+/refs/heads/wmf/1.36.0-wmf.3. Verify that the submodule hash for vendor corresponds to the tip of the branch of mediawiki-vendor. If it's not correct, after the branch commit has been merged into mediawiki-core you need to manually bump the submodules:

cd .../mediawiki-core
# note that the below will clobber your vendor, extensions, and skins directories
# you might want to use a new clean checkout of core
git checkout wmf/1.36.0-wmf.3
git submodule update --init
git submodule update --remote vendor
git add vendor
git commit -m "Update git submodules"
git review -u

Review and merge that.

Misc stuff

  • To deploy to a single host
    scap deploy --force -l <node>
  • To see which hosts are pooled, from another host
    confctl select dc=.*,cluster=parsoid,service=parsoid get
  • To see the list of parsoid hosts in beta:
    cat /srv/deployment/parsoid/deploy/scap/betacluster
    • See also /srv/deployment/parsoid/deploy/scap/scap.cfg in general
  • To pool/depool a node, from deployment.eqiad.wmnet, run:
    • To depool: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'depool service=parsoid'
    • To pool  : SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'

Data flow

Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in extensions/VisualEditor/ApiVisualEditor.php and is invoked with a POST request to /w/api.php?action=ve-parsoid. The API module then sends a request to Parsoid, either GET /$prefix/$pagename to get the HTML for a page, or POST /$prefix/$pagename to submit HTML and get wikitext back. Parsoid itself also issues requests to /w/api.php to get the wikitext of the requested page and to do template expansion.

Once the ve-parsoid API module receives a response from Parsoid, it either relays it back to the client (when requesting HTML), or saves the returned wikitext to the page (when submitting HTML).

                (POST /w/api.php?action=ve-parsoid)          (GET /en/Barack_Obama?oldid=1234)           (requests for page content and template expansions)
Client browser ------------------------------------------> API ---------------------------->  Parsoid -----------------------------------------------------> API
    ^                                                      | ^                                 |   ^                                                          |
    |                  (response)                          | |      (HTML)                     |   |                   (responses)                            |
    +------------------------------------------------------+ +---------------------------------+   +----------------------------------------------------------+


                (POST /w/api.php?action=ve-parsoid)          (POST /en/Barack_Obama; oldid=1234)
Client browser ------------------------------------------> API ---------------------------->  Parsoid
                                                           | ^                                 |
                                               (save page) | |      (wikitext)                 |
                                                           | +---------------------------------+
                                                           |
                                                        Database