You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Parsoid
Parsoid is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext.
- VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.
- Flow (as configured on WMF wikis with
$wgFlowContentFormat = 'html'
) works the other way around. When a user creates a post Flow uses Parsoid to convert the wikitext to HTML and Flow stores the HTML in ExternalStore. If someone later edits a post Flow uses Parsoid to convert the HTML back to wikitext for editing.
Monitoring
- Parsoid eqiad cluster in Grafana
- Parsoid codfw cluster in Grafana
- Icinga has service checks for HTTP on port 8000 on both the individual backends and on the LVS service IP.
- pybal does health checks on all backends every second, and depools boxes that are down as long as the % of depooled boxes does not exceed 50%. To see these health checks and depools/repools happen in real time, run
ssh parsoid.svc.eqiad.wmnet
(this will drop you into either lvs1003 or lvs1006, depending on which is active), thentail -f /var/log/pybal.log | grep parsoid
- Logging happens in /var/log/parsoid/parsoid.log. There is a log rotation setup in /etc/logrotate.d/parsoid.
- Logs in logstash (Parsoid/JS): https://logstash.wikimedia.org/app/kibana#/dashboard/parsoid
- Logs in logstash (Parsoid/PHP): https://logstash.wikimedia.org/app/kibana#/dashboard/AW4Y6bumP44edBvO7lRc
- Useful links for Parsoid deployers
- Currently running Parsoid version:
- In beta: https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version#mw-version-library-wikimedia.2Fparsoid
- In production: https://en.wikipedia.org/wiki/Special:Version#mw-version-library-wikimedia.2Fparsoid
- On scandium: See https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing
- (Be aware that the parsoid cluster is behind restbase and although the cluster *should* be running the same version of Parsoid as the mediawiki frontends, if puppet or scap are broken (esp in beta) things could diverge.)
Deploying changes
Parsoid is deployed as part of the MediaWiki train. See How to deploy code for an overview, Heterogeneous deployment for a more technical description of the directory structures involved, and Heterogeneous deployment/Train deploys for the steps to do a train deploy. When code changes outside the train schedule are required, a SWAT deploy will be required. Generally Parsing team members won't be doing train deploys or SWAT deploys directly; we will tag a Parsoid version (which releases it to packagist to make it available via composer) and merge a version bump into the mediawiki/vendor repository. Once the patch is merged into vendor
, the new version of Parsoid goes live in beta (almost) immediately; it will then be rolled out to production on the next train.
Machine overview
These are the machines involved in a Parsoid deploy:
- In the beta/wmflabs cluster:
deployment-deploy01.deployment-prep.eqiad.wmflabs
: staging host in beta; no longer used.deployment-parsoid11.deployment-prep.eqiad.wmflabs
: parsoid server in betadeployment-restbase02.deployment-prep.eqiad.wmflabs
: restbase server in beta
- In the production cluster:
deployment.eqiad.wmnet
: staging host in production; no longer usedwtp1xxx
: parsoid servers in eqiad clusterrestbase1xxx
: restbase servers in eqiad clusterwtp2xxx
: parsoid servers in codfw clusterrestbase2xxx
: restbase servers in codfw clusterscandium.eqiad.wmnet
: Parsoid testing host, has read-only access to the production database.
Deploying Parsoid
Test the version you hope to deploy
- See mw:Parsoid/Round-trip testing for details.
- RT testing results are no longer publicly accessible on the web. You need to establish a ssh tunnel to the web service on scandium.
- Check http://localhost:<tunnel-port>/regressions/between/{from}/{to} where {from} is the last deployed hash from mw:Parsoid/Deployments and {to} is the latest tested commit (which we're about to deploy)
- http://localhost:<tunnel-port>/commits gives you a nice radio-button interface to create this URL
- BEWARE: if you get the output
total regressions between selected revisions: 0
, it is extremely likely that you mistyped the hash or that we didn't actually run round-trip tests for that particular hash. (This is a bug, we should probably give a better message in this case.) - Since we are using current revision of titles in round-trip testing, edits to pages can show up as false regressions. tools/regression-testing.js is useful in filtering those out. Grab the wiki:title pairs that show up as regressions in the regressions url, save it to a file and feed it to the regressions tool and you will get a list of pages to look more closely, if necessary.
- Check that there are no concerning notices or errors in logstash from the rt run
Prepare the vendor patch
(This process was hashed out in phab:T240055)
- Create a short deployment summary on mw:Parsoid/Deployments from
git log --cherry-pick {from}...{to}
. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, parser test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.) - Tag a new version of parsoid and push the tag:
git tag v0.12.0-a{N}
thengit push origin v0.12.0-a{N}
(Include the leading 'v', and substitute the next version number for {N}.)- Check that this version has been picked up at https://packagist.org/packages/wikimedia/parsoid (might take a minute)
- Checkout
mediawiki/vendor.git
master branch. In that repo:- Update
composer.json
to include"wikimedia/parsoid": "0.12.0-aN",
(for your version {N}; note no leading "v") - Do
composer update --no-dev
(which should only update parsoid) - Add the changed files to git, commit, and upload to gerrit:
- Update
git add wikimedia/parsoid composer.lock composer.json # & etc
git commit
git review
- Use a commit message that (1) names the new parsoid tag, (2) includes the git hash of the new parsoid version, and (3) references key bug #s from the deployment summary so the deploy gets linked to phab. For example:
Bump parsoid to 0.12.0-aN This corresponds to Parsoid commit cafecafecafecafe. Bug: T111111 Bug: T222222
- Review and C+2 on gerrit. This will go live on beta cluster pretty quickly (within 30 minutes).
- TEMPORARILY you will also need to ssh to
deployment-parsoid11.deployment-prep.eqiad.wmflabs
and do a scap pull due to phab:T247545
Verify deployment version on beta after the vendor patch is merged
- Check that this is live on the mediawiki front ends in beta by watching the version number listed on https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version#mw-version-library-wikimedia.2Fparsoid
- TEMPORARILY while we're having scap problems, you'll want to check the version on
parsoid11
as well:
$ ssh deployment-parsoid11.deployment-prep.eqiad.wmflabs
user@deployment-parsoid11$ curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version' | fgrep wikimedia/parsoid
- If beta cluster is down or visual editor is down in beta cluster, do not continue with routine deployments.
- On beta cluster (eg
en.wikipedia.beta.wmflabs.org
), perform manual VisualEditor editing tests. This requires you to have an account on the beta cluster wiki. Test with non-ASCII content too to catch encoding issues. Check parsoid logs. Be particularly alert to integration issues: library conflicts, etc.
Be around on IRC
This hasn't been updated for the train yet. But in general we need parsing team coverage during the period when the train is rolling forward, in order to catch any Parsoid-related issues.
- Add yourself to the "deployer" field of Deployments if you're not already there
- Be online in freenode #wikimedia-operations (and stay online through the deployment window)
During the Parsoid deploy
This hasn't been updated for train process yet. In general we're not doing the scap and we no longer have canary nodes to watch. But we can/should be monitoring logs as the ops team rolls the train forward.
Post-deploy checks
- Test VE editing on enwiki and non-latin wikis
- For example, open it:Luna (or other complex page), start the visual editor, make some random vandalism, click save -> review changes, then verify that the wikitext reflects your changes and was not corrupted. Hit cancel to abort the edit.
- Reading through the recent edits (frwiki, enwiki) can also be a good check.
Testing a version bump
If the deployed version of Parsoid updates the Parsoid DOM version and/or will exercises the html2html "down convert" endpoint, the following test procedure will ensure that clients are getting the appropriate DOM version:
- First and foremost, mocha tests should already be present that cover both downgrading the HTML and serializing it with and without selser.
- Create a test page on the beta cluster containing the features that merited the major version bump.
- Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via
deployment-parsoid11.deployment-prep.eqiad.wmflabs
) accepting the various specs that are available. The inline meta tag and aforementioned features should indicate that it worked. Example requests might be,- For the old version,
curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/1.7.0"'
- For the new version,
curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/html/Test_Page' -H'Accept: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.0.0"'
- For the old version,
- Confirm that VE on the beta cluster is still tied to the older content version and will be needing a downgrade (see the commit in Special:Version for the extension and compare with the header defined in
includes/ApiVisualEditor.php
) - At this point, two scenarios need to be tested: an edit starting from the older content version stored in RESTBase (which won't require a downgrade) and one starting from the new content version, which will.
- Note that, for extra points, there are potentially several versions numbers stored in RESTBase that satisfy the VE request based on caret semantics and it might be worthwhile to confirm that edits starting from those versions work as well.
- Once you've found stored content in RESTBase with an appropriate version for your test it's prudent to confirm that VE is actually editing what you expect. This can be achieved by dumping the various DOMs: the original
copy(ve.init.target.doc.body.outerHTML)
and the editedcopy(ve.init.target.docToSave.body.outerHTML)
- In each case, try to confirm that the features can be edited directly as well as being ignored by selser (usually because no normalizations occur). Unfortunately, testing here is a bit more art than science.
- Finally, open up the various testing dashboards for logging and metrics to verify that no unexpected errors are present and that the downgrades are accounted for.
Testing on scandium
When on scandium, use this command to test Parsoid directly:
curl -x scandium.eqiad.wmnet:80 http://<domain>/w/rest.php/<domain>/v3/page/html/<title>/<revid>
Testing LanguageConverter
LanguageConverter can be tested on beta in a manner similar to testing a version bump.
- Create a test page on the beta cluster containing the language converter features you wish to touch. Either the page language for the article must be set to a language w/ variants, or else the article must take place on a wiki where the main language has variants. We'll use the SrTest page on beta srwiki in our examples below.
- Deploy the desired commit to the beta cluster and, as a sanity check, make requests for the above test page from Parsoid directly (via ssh to
deployment-parsoid11.deployment-prep.eqiad.wmflabs
) specifying the desired variant language. Verify that the result has been converted appropriately. Example requests might be,
curl -H'Accept-Language: sr-ec' -x deployment-parsoid11:80 http://sr.wikipedia.beta.wmflabs.org/w/rest.php/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23
curl -H'Accept-Language: sr-el' -x deployment-parsoid11:80 http://sr.wikipedia.beta.wmflabs.org/w/rest.php/sr.wikipedia.beta.wmflabs.org/v3/page/html/User:Cscott%2FSrTest/23
- To test in production, try something like:
curl -X GET --header 'Accept-Language: sr-el' 'https://sr.wikipedia.org/api/rest_v1/page/html/%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0/21280369'
See https://phabricator.wikimedia.org/T241146#5810424 for some more examples.
Deploying a cherry-picked patch
One way to do this is to create a new branch in the Parsoid repo and cherry-pick your patches to that. For example:
git checkout v0.12.0-a3 # this is the commit on the master branch that you want to cherry pick on top of git checkout -b deploy-20150528 # give it a name (go ahead and use the date of your deploy) git cherry-pick f274c3f54f385a6ac159a47209d279b9040a161c # patch number 1 git cherry-pick de087b106be48fc6e97f2ebc4644f9d297ecdfed # patch number 2 git push gerrit deploy-20150528:deploy-20150528 # create the branch in gerrit (DON'T USE SLASHES HERE)
Now do the usual steps to tag a release and prepare a vendor branch patch (see above) using the next available release version number (v0.12.0-a4 in the example below):
git tag v0.12.0-a4 # this is the next available release number git push origin v0.12.0-a4
Switch to the mediawiki/vendor
repository:
git checkout master ; git pull origin master edit composer.json # set wikimedia/parsoid to v0.12.0-a4 composer update --no-dev git add -u git commit -m "Bump wikimedia/parsoid to v0.12.0-a4" git review -u
Note that the automated push to beta will fail if your gerrit branch name contains a slash. This is probably just because some ancient version of git is being used, and will eventually be fixed. But in the meantime, use dashes instead of slashes.
Cherry-picking directly from the deployment server and deploying it
THIS HAS NOT BEEN UPDATED FOR TRAIN DEPLOYS. Hotfixes need to be deployed via SWAT of a new mediawiki-vendor repo, and SWAT is generally non-atomic so there are Issues to be aware of.
OLD TEXT FOLLOWS: In many situations, a hotfix might need to be pushed quickly. One way to do that is to cherry-pick the patch on the deployment server (aka deployment.eqiad.wmnet) and sync it.
### Verify that you have the most recently deployed code that you want to cherry-pick on top of tin$ cd /srv/deployment/parsoid/deploy (verify via git log) tin$ cd src (verify via git log) ### Create a hotfix branch tin$ git checkout -b hotfix_<some_unique_tag> ### Get latest code from master you want to cherry-pick from tin$ git checkout master; git pull ### Check out the hotfix branch and cherry-pick tin$ git checkout hotfix_<some_unique_tag> tin$ git cherry-pick <commit-from-master> ### Create a deploy-repo patch tin$ cd ..; git commit -a -m "Bump src to whatever-git-sha-it-is for hotfix" ### The usual deployment steps tin$ scap deploy ... verify deployment ...
Restarting
THIS HAS NOT BEEN UPDATED FOR TRAIN DEPLOYS.
In case you need to restart Parsoid without any deployments (for example, to reload mediawiki configs from config or other deployments),
- Restart parsoid hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
cd /srv/deployment/parsoid/deploy && scap deploy --service-restart
Converting a Parsoid/JS to a Parsoid/PHP server
THIS HAS NOT BEEN UPDATED. We're in the process of purging the old parsoid/JS configuration from puppet.
A conventional parsoid/JS (wtp*) server can be switched to a parsoid/PHP server by setting the parameter "$use_php" to "true" in Hiera. It is a parameter of the class profile::parsoid and defaults to false. Setting this to true will mean a number of profile::mediawiki:: classes will be included turning it into a MediaWiki appserver-like server effectively.
After doing so a couple Hiera keys are needed to configure PHP and related things. See below and compare to the values in hieradata/role/common/parsoid.yaml.
# switch Parsoid/JS-server to Parsoid/PHP-MW-appserver profile::parsoid::use_php: true has_lvs: true profile::mediawiki::php::php_version: "7.2" profile::mediawiki::webserver::has_tls: true nutcracker::verbosity: "4" # Bump the connections per backend to 5 in mcrouter, see T203786 profile::mediawiki::mcrouter_wancache::num_proxies: 5 profile::mediawiki::httpd::logrotate_retention: 12 profile::mediawiki::vhost_feature_flags: {} # bail out in case a long-lasting C function is called and # excimer can't throw its exception profile::mediawiki::php::request_timeout: 201 profile::mediawiki::apc_shm_size: 4096M profile::mediawiki::php::enable_fpm: true profile::mediawiki::php::fpm_config: opcache.interned_strings_buffer: 96 opcache.memory_consumption: 1024 apc.ttl: 10 # Configure php-fpm restarts profile::mediawiki::php::restarts::ensure: present # We set the restart watermark at 200 MB, which is approximately how much # opcache one full day of deployments consume. profile::mediawiki::php::restarts::opcache_limit: 200
Old content: update deploy repo
We also have a small deploy repo which just contains the node.js dependencies required to run our round-trip testing client/server. To update these (FIXME, this hasn't been updated):
- Prepare a deploy repo commit and push for +2
- Roughly:
cd deploy ; git checkout master ; git pull origin master ; git submodule update ; cd src ; git checkout {to} ; cd .. ; git add -u ; git commit -m "Bump src to {to} for deploy" ; git review
- Roughly:
When something goes wrong
Reverting a Parsoid deployment
Code
ssh deployment.eqiad.wmnet cd /srv/deployment/parsoid/deploy scap deploy --rev <sha>
Misc stuff
- To deploy to a single host
scap deploy --force -l <node>
- To see which hosts are pooled, from another host
confctl select dc=.*,cluster=parsoid,service=parsoid get
- To see the list of parsoid hosts in beta:
cat /srv/deployment/parsoid/deploy/scap/betacluster
- See also
/srv/deployment/parsoid/deploy/scap/scap.cfg
in general
- To pool/depool a node, from deployment.eqiad.wmnet, run:
- To depool:
SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'depool service=parsoid'
- To pool :
SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service <node> 'pool service=parsoid'
- To depool:
Data flow
Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in extensions/VisualEditor/ApiVisualEditor.php
and is invoked with a POST request to /w/api.php?action=ve-parsoid
. The API module then sends a request to Parsoid, either GET /$prefix/$pagename
to get the HTML for a page, or POST /$prefix/$pagename
to submit HTML and get wikitext back. Parsoid itself also issues requests to /w/api.php
to get the wikitext of the requested page and to do template expansion.
Once the ve-parsoid API module receives a response from Parsoid, it either relays it back to the client (when requesting HTML), or saves the returned wikitext to the page (when submitting HTML).
(POST /w/api.php?action=ve-parsoid) (GET /en/Barack_Obama?oldid=1234) (requests for page content and template expansions) Client browser ------------------------------------------> API ----------------------------> Parsoid -----------------------------------------------------> API ^ | ^ | ^ | | (response) | | (HTML) | | (responses) | +------------------------------------------------------+ +---------------------------------+ +----------------------------------------------------------+ (POST /w/api.php?action=ve-parsoid) (POST /en/Barack_Obama; oldid=1234) Client browser ------------------------------------------> API ----------------------------> Parsoid | ^ | (save page) | | (wikitext) | | +---------------------------------+ | Database