You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Proton"

From Wikitech
Jump to navigation Jump to search
imported>Gergő Tisza
imported>Cwhite
m
 
Line 1: Line 1:
 
[[mw:Proton|Proton]] is a service that converts the Wikipedia articles into PDF. It uses [https://github.com/GoogleChrome/puppeteer Pupeeteer] to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer [https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/docs/api.md#pagepdfoptions page.pdf()] call to return PDF version of the article.
 
[[mw:Proton|Proton]] is a service that converts the Wikipedia articles into PDF. It uses [https://github.com/GoogleChrome/puppeteer Pupeeteer] to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer [https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/docs/api.md#pagepdfoptions page.pdf()] call to return PDF version of the article.
 +
 +
== Source Code ==
 +
* https://gerrit.wikimedia.org/r/admin/projects/mediawiki/services/chromium-render
  
 
== Monitoring ==
 
== Monitoring ==

Latest revision as of 22:40, 12 September 2019

Proton is a service that converts the Wikipedia articles into PDF. It uses Pupeeteer to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer page.pdf() call to return PDF version of the article.

Source Code

Monitoring

Deploying changes

Proton is deployed using scap3. Doing deployments with scap3 is very easy. You just run scap deploy, which pushes the new state to all backends and restarts them. You should have deploy access and be a member of the proton-admins group puppet group.

Pre-deploy checks

Prepare the deploy patch

Proton is based on Services template. For more detailed information please refer to Services/Deployment.

  • Create a short deployment summary on mw:Proton/Deployments from git log --cherry-pick {from}...{to}. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
  • Prepare a chromium-render/deploy deploy repo commit and push for +2

Verify deployment version on beta after the deploy patch is merged

  • Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to deployment-deploy01.deployment-prep.eqiad.wmflabs)
  • You can use the new_pdf=1 query parameter with the RESTBase PDF URL to make it route the request to Proton.

Be around on IRC

  • Add yourself to the "deployer" field of Deployments if you're not already there
  • Be online in freenode #wikimedia-operations (and stay online through the deployment window)

Deploying the latest version of Proton

Now to do the deploy:

ssh deployment.eqiad.wmnet
cd /srv/deployment/proton/deploy
git pull
git submodule update --init
scap deploy "`git log --pretty=format:'%s' -n 1` (T<bug number>, T<bug number>)"

Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like

Post-deploy checks

Restarting

In case you need to restart Proton without any deployments (for example, to reload mediawiki configs from config or other deployments),

  • Restart proton hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
    cd /srv/deployment/proton/deploy && scap deploy --service-restart

When something goes wrong

Reverting a Proton deployment

Code

ssh deployment.eqiad.wmnet
cd /srv/deployment/proton/deploy
scap deploy --rev <sha>

Target machines

If you need to check something on the target machines:

  • Beta Cluster:
    • deployment-chromium01.deployment-prep.eqiad.wmflabs
  • Production:
    • proton1001.eqiad.wmnet
    • proton1002.eqiad.wmnet
    • proton2001.codfw.wmnet
    • proton2002.codfw.wmnet

Proton uses port 24766 in Beta and production.

Updating Puppeteer & Chromium

We use a very small set of Puppeteer features and usually, it is pretty safe to update both the Puppeteer library and the Chromium browser. Before you start updating Pupeteer and Chromium, please keep in mind that versions of Puppeteer and Chromium are tighly coupled:

Puppeteer acts as an indivisible entity with Chromium. Each version of Puppeteer bundles a specific version of Chromium – the only version it is guaranteed to work with. This is not an artificial constraint: A lot of work on Puppeteer is actually taking place in the Chromium repository.

For more information please refer to Why doesn’t Puppeteer v.XXX work with Chromium v.YYY?

Puppeteer updates

We pin puppeteer to a specific version in the package.json file. The latest Puppeteer version can be found on Puppeteer releases page. The update process is very simple and it narrows down to bumping the puppeteer version in the package.json file, running npm install to fetch new version and testing that service renders PDF correctly. Puppeteer is usually shipped with not-yet-stable version for Chrome, there is no need to update the Puppeteer with every release. Because Puppeteer is coupled with specific Chromium version - the Puppeteer updates should be performed only when the new version provides useful features/fixed issues related to HTML/PDF rendering.

Chromium updates

We decided to use Chromium bundled with the operating system as this approach sounded like a most reasonable solution. The Chromium packages in Debian (OS we're using) are verified by Debian maintainers and are guaranteed to work and not have any destructive behavior.

We analyzed other ways to ship Chromium, but they were rejected:

  • using Chromium version bundled with Puppeteer - This was rejected due to fact that Puppeteer downloads the chromium from some servers and we do not have control over it. There was no safe way to verify that downloaded version is safe to use in WMF environment.
  • store Chromium executable in the Proton repository - This was rejected due to the size of chromium executable. It's over 100MB. The chromium-render repository would grow too fast and it would become pretty difficult to maintain in near future.
  • installing chromium manually (or via some script) - This was rejected due to higher maintenance cost. The Chromium version shipped with Debian is proven to work properly with the Puppeteer version we're currently using.

When you decide to update or Puppeteer, or Chromium browser you should pick the version of Puppeteer that uses the Chromium version (or vice versa) close enough to the one bundled with given Puppeteer version. We cannot update Chromium that often as Debian release cycle is bit slow and the bundled Chromium version is not the latest stable.

Puppeteer configuration

Wikimedia environment is very specific and it requires special puppet configuration. We need to pass additional config options that is very difficult explain why, as those can look like security loopholes:

  • --ignoreHTTPSErrors flag was introduced because we use a self-signed certificate for our internal wiki domains (since the CA is our Puppet), and using internal domains is the standard way of accessing MediaWiki appservers from REST services. Given that Proton cannot communicate with the outside world, and even if it receives malicious HTML, it should be able to handle it safely, it is safe to use the `ignoreHTTPSErrors` configuration flag. This config is set only on production environment (in deploy repo). The chromium-render repository doesn't have that option set.
  • --no-sandbox and --disable-setuid-sandbox flags are required to properly execute Chromium inside docker environment. Chromium sandboxing requires kernel user namespaces set up properly. You can find more information about the issue on Chrome won't work without --no-sandbox option issue. Chrome process is firejailed which means is already sandboxed by us and there is no need to use built-in chrome sandboxing.
  • --font-rendering-hinting=medium, --enable-font-antialiasing, --disable-gpu flags are used to tune up the fonts rendering. We want consistent fonts rendering across all production/staging/beta and development platforms.
  • --hide-scrollbars and --no-first-run flags are used to improve rendering PDF page. Most probably they are not required, but it is safer to keep then on

Data flow

Proton flow.png