You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Proton: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Pmiazga
(Created page with "Proton is a service that converts the Wikipedia articles into PDF. It uses [https://github.com/GoogleChrome/puppeteer Pupeeteer] to fetch the Wikipedia page, ren...")
 
imported>Quiddity
m (fixes)
 
(4 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[mw:Proton|Proton]] is a service that converts the Wikipedia articles into PDF. It uses [https://github.com/GoogleChrome/puppeteer Pupeeteer] to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer [https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/docs/api.md#pagepdfoptions page.pdf()] call to return PDF version of the article.
[[mw:Proton|Proton]] is a service that converts the Wikipedia articles into PDF. It uses [https://github.com/GoogleChrome/puppeteer Pupeeteer] to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer [https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/docs/api.md#pagepdfoptions page.pdf()] call to return PDF version of the article.
== Source Code ==
* https://gerrit.wikimedia.org/r/admin/projects/mediawiki/services/chromium-render


== Monitoring ==
== Monitoring ==
Line 6: Line 9:
* [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=proton&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=proton&var-instance=All Prometheus breakdown for the Proton cluster on codfw]
* [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=proton&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=proton&var-instance=All Prometheus breakdown for the Proton cluster on codfw]
* [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=proton&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=proton&var-instance=All Prometheus breakdown for the Proton cluster on eqiad]
* [https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&cluster=proton&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=proton&var-instance=All Prometheus breakdown for the Proton cluster on eqiad]
* [[CI/JJB|Jenkins Job Builder docs]] for updating Jenkins jobs
* [[mw:CI/JJB|Jenkins Job Builder docs]] for updating Jenkins jobs
* [https://wikitech.wikimedia.org/wiki/Icinga Icinga] uses the swagger file and performs HTTP checks for Desktop/Mobile prints plus it verifies the requesting non-existing page.
* [https://wikitech.wikimedia.org/wiki/Icinga Icinga] uses the swagger file and performs HTTP checks for Desktop/Mobile prints plus it verifies the requesting non-existing page.


== Deploying changes ==
== Deploying changes ==
{{Outdated-inline|note=Proton is running on kubernetes now}}
Proton is deployed using scap3. Doing deployments with scap3 is very easy. You just run <code>scap deploy</code>, which pushes the new state to all backends and restarts them. You should have [[How_to_deploy_code#Deployment_requirements|deploy access]] and be a member of the [https://gerrit.wikimedia.org/r/#/c/478776/ proton-admins group] puppet group.
Proton is deployed using scap3. Doing deployments with scap3 is very easy. You just run <code>scap deploy</code>, which pushes the new state to all backends and restarts them. You should have [[How_to_deploy_code#Deployment_requirements|deploy access]] and be a member of the [https://gerrit.wikimedia.org/r/#/c/478776/ proton-admins group] puppet group.


=== Pre-deploy checks ===
=== Pre-deploy checks ===
None


==== Prepare the deploy patch ====
==== Prepare the deploy patch ====
Proton is based on Services template. For more detailed information please refer to [Services/Deployment]
Proton is based on Services template. For more detailed information please refer to [[Services/Deployment]].
* Create a short deployment summary on [[mw:Proton/Deployments]] from <code>git log --cherry-pick {from}...{to}</code>. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
* Create a short deployment summary on [[mw:Proton/Deployments]] from <code>git log --cherry-pick {from}...{to}</code>. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
* Prepare a [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/chromium-render/deploy chromium-render/deploy deploy repo] commit and push for +2
* Prepare a [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/chromium-render/deploy chromium-render/deploy deploy repo] commit and push for +2


==== Verify deployment version on beta after the deploy patch is merged ====
==== Verify deployment version on beta after the deploy patch is merged ====
* Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to <code>deployment-chromium01.deployment-prep.eqiad.wmflabs</code>)
* Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to <code>deployment-deploy01.deployment-prep.eqiad.wmflabs</code>)
* You can use the <code>new_pdf=1</code> query parameter with the [https://en.wikipedia.beta.wmflabs.org/api/rest_v1/#/Page%20content/generatePDF RESTBase PDF URL] to make it route the request to Proton.


==== Be around on IRC ====
==== Be around on IRC ====
* Add yourself to the "deployer" field of [[Deployments]] if you're not already there
* Add yourself to the "deployer" field of [[Deployments]] if you're not already there
* Be online in freenode #wikimedia-operations (and stay online through the deployment window)
* Be online in the libera.chat IRC channel {{irc|wikimedia-operations}} (and stay online through the deployment window)


=== Deploying the latest version of Proton ===
=== Deploying the latest version of Proton ===


Now to do the deploy:<source lang="bash">
Now to do the deploy:<syntaxhighlight lang="bash">
ssh deployment.eqiad.wmnet
ssh deployment.eqiad.wmnet
tin$ cd /srv/deployment/proton/deploy
cd /srv/deployment/proton/deploy
tin$ git pull
git pull
tin$ git submodule update --init
git submodule update --init
tin$ scap deploy 'Updating Proton to <new hash> (T<bug number>, T<bug number>)'
scap deploy "`git log --pretty=format:'%s' -n 1` (T<bug number>, T<bug number>)"
</source>
</syntaxhighlight>


Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like
Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like


=== Post-deploy checks ===
=== Post-deploy checks ===
* Verify the [https://grafana.wikimedia.org/dashboard/db/proton Grafana dashboard for PDF metrics] that service handles similar number of requests
* Verify the [https://grafana.wikimedia.org/dashboard/db/proton Grafana dashboard for PDF metrics] that service handles similar number of requests. Some of the machine stats such as memory consumption might also be worth looking at ([https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&from=now-6h&to=now&var-server=proton1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=proton proton1001], [https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&from=now-6h&to=now&var-server=proton1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=proton proton 1002], [https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&from=now-6h&to=now&var-server=proton2001&var-datasource=codfw%20prometheus%2Fops&var-cluster=proton proton2001], [https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&from=now-6h&to=now&var-server=proton2002&var-datasource=codfw%20prometheus%2Fops&var-cluster=proton proton2002]).
* Verify that logstash has no new errors
* Verify that logstash has no new errors ([https://logstash.wikimedia.org/app/kibana#/dashboard/e1bb3340-f997-11e8-b3c1-4ff0065d7257 proton logs], [https://logstash.wikimedia.org/goto/e80c4c05d351bbcbff679fd936fc8e9b RESTBase /page/pdf endpoint logs])
* Use the Proton testing tool and verify that generated PDF is still correct
* Use the Proton testing tool and verify that generated PDF is still correct


Line 47: Line 51:
In case you need to restart Proton without any deployments (for example, to reload mediawiki configs from config or other deployments),
In case you need to restart Proton without any deployments (for example, to reload mediawiki configs from config or other deployments),


* Restart proton hosts, from deployment.eqiad.wmnet (production) or deployment-chromium01.deployment-prep.eqiad.wmflabs (beta)
* Restart proton hosts, from <code>deployment.eqiad.wmnet</code> (production) or <code>deployment-deploy01.deployment-prep.eqiad.wmflabs</code> (beta)
*:<code>cd /srv/deployment/proton/deploy && scap deploy --service-restart</code>
*:<code>cd /srv/deployment/proton/deploy && scap deploy --service-restart</code>


Line 58: Line 62:
scap deploy --rev <sha></pre>
scap deploy --rev <sha></pre>


== Target machines ==
If you need to check something on the target machines:
* Beta Cluster:
** <code>deployment-chromium01.deployment-prep.eqiad.wmflabs</code>
* Production:
** <code>proton1001.eqiad.wmnet</code>
** <code>proton1002.eqiad.wmnet</code>
** <code>proton2001.codfw.wmnet</code>
** <code>proton2002.codfw.wmnet</code>
Proton uses port 24766 in Beta and production.
== Updating Puppeteer & Chromium ==
We use a very small set of Puppeteer features and usually, it is pretty safe to update both the Puppeteer library and the Chromium browser.
Before you start updating Pupeteer and Chromium, please keep in mind that versions of Puppeteer and Chromium are tighly coupled:
<blockquote>Puppeteer acts as an indivisible entity with Chromium. Each version of Puppeteer bundles a specific version of Chromium – '''the only version''' it is guaranteed to work with. This is not an artificial constraint: A lot of work on Puppeteer is actually taking place in the Chromium repository.
</blockquote>
For more information please refer to [https://github.com/GoogleChrome/puppeteer#q-why-doesnt-puppeteer-vxxx-work-with-chromium-vyyy Why doesn’t Puppeteer v.XXX work with Chromium v.YYY?]
=== Puppeteer updates ===
We pin puppeteer to a specific version in the [https://github.com/wikimedia/mediawiki-services-chromium-render/blob/master/package.json#L42 package.json] file. The latest Puppeteer version can be found on [https://github.com/GoogleChrome/puppeteer/releases Puppeteer releases] page. The update process is very simple and it narrows down to bumping the puppeteer version in the package.json file, running <code>npm install</code> to fetch new version and testing that service renders PDF correctly.
Puppeteer is usually shipped with not-yet-stable version for Chrome, there is no need to update the Puppeteer with every release. Because Puppeteer is coupled with specific Chromium version - the Puppeteer updates should be performed only when the new version provides useful features/fixed issues related to HTML/PDF rendering.
=== Chromium updates ===
We decided to use Chromium bundled with the operating system as this approach sounded like a most reasonable solution. The Chromium packages in Debian (OS we're using) are verified by Debian maintainers and are guaranteed to work and not have any destructive behavior.
We analyzed other ways to ship Chromium, but they were rejected:
* using Chromium version bundled with Puppeteer - This was rejected due to fact that Puppeteer downloads the chromium from some servers and we do not have control over it. There was no safe way to verify that downloaded version is safe to use in WMF environment.
* store Chromium executable in the Proton repository - This was rejected due to the size of chromium executable. It's over 100MB. The <code>chromium-render</code> repository would grow too fast and it would become pretty difficult to maintain in near future.
* installing chromium manually (or via some script) - This was rejected due to higher maintenance cost. The Chromium version shipped with Debian is proven to work properly with the Puppeteer version we're currently using.
When you decide to update or Puppeteer, or Chromium browser you should pick the version of Puppeteer that uses the Chromium version (or vice versa) close enough to the one bundled with given Puppeteer version. We cannot update Chromium that often as Debian release cycle is bit slow and the bundled Chromium version is not the latest stable.
== Puppeteer configuration ==
Wikimedia environment is very specific and it requires special puppet configuration. We need to pass additional config options that is very difficult explain why, as those can look like security loopholes:
* <code>--ignoreHTTPSErrors</code> flag was [https://github.com/wikimedia/mediawiki-services-chromium-render-deploy/commit/17fc7bb4cda2e1b408d6289c27685624d35e8714 introduced] because we use a self-signed certificate for our internal wiki domains (since the CA is our Puppet), and using internal domains is the standard way of accessing MediaWiki appservers from REST services. Given that Proton cannot communicate with the outside world, and even if it receives malicious HTML, it should be able to handle it safely, it is safe to use the `ignoreHTTPSErrors` configuration flag. This config is set only on production environment (in deploy repo). The chromium-render repository doesn't have that option set.
* <code>--no-sandbox</code> and <code>--disable-setuid-sandbox</code> flags are required to properly execute Chromium inside docker environment. Chromium sandboxing requires kernel user namespaces set up properly. You can find more information about the issue on [https://github.com/jessfraz/dockerfiles/issues/149 Chrome won't work without --no-sandbox option issue]. Chrome process is firejailed which means is already sandboxed by us and there is no need to use built-in chrome sandboxing.
* <code>--font-rendering-hinting=medium</code>, <code>--enable-font-antialiasing</code>, <code>--disable-gpu</code> flags are used to tune up the fonts rendering. We want consistent fonts rendering across all production/staging/beta and development platforms.
* <code>--hide-scrollbars</code> and <code>--no-first-run</code> flags are used to improve rendering PDF page. Most probably they are not required, but it is safer to keep then on


== Data flow ==
== Data flow ==
[[File:Proton flow.png]]
[[File:Proton flow.png]]
[[Category:Services]]
[[Category:Services]]

Latest revision as of 19:01, 4 September 2021

Proton is a service that converts the Wikipedia articles into PDF. It uses Pupeeteer to fetch the Wikipedia page, render it in headless chromium, and then calls the puppeteer page.pdf() call to return PDF version of the article.

Source Code

Monitoring

Deploying changes

Proton is deployed using scap3. Doing deployments with scap3 is very easy. You just run scap deploy, which pushes the new state to all backends and restarts them. You should have deploy access and be a member of the proton-admins group puppet group.

Pre-deploy checks

Prepare the deploy patch

Proton is based on Services template. For more detailed information please refer to Services/Deployment.

  • Create a short deployment summary on mw:Proton/Deployments from git log --cherry-pick {from}...{to}. Don't include all commits, but only notable fixes and changes (ignore rt-test fixes, code cleanup updates, test updates, etc). (The above command will do the right thing if {from} was on a branch and had patches cherry-picked from {to}, although if there were conflicts during the cherry-pick to {from} the patch will still appear in the log for {to}.)
  • Prepare a chromium-render/deploy deploy repo commit and push for +2

Verify deployment version on beta after the deploy patch is merged

  • Deploy code (if not already there) to the beta cluster (same instructions as below but ssh to deployment-deploy01.deployment-prep.eqiad.wmflabs)
  • You can use the new_pdf=1 query parameter with the RESTBase PDF URL to make it route the request to Proton.

Be around on IRC

  • Add yourself to the "deployer" field of Deployments if you're not already there
  • Be online in the libera.chat IRC channel #wikimedia-operations connect (and stay online through the deployment window)

Deploying the latest version of Proton

Now to do the deploy:

ssh deployment.eqiad.wmnet
cd /srv/deployment/proton/deploy
git pull
git submodule update --init
scap deploy "`git log --pretty=format:'%s' -n 1` (T<bug number>, T<bug number>)"

Scap will log the completion of the deploy, but if you want to add additional information to the SAL, make a comment in #wikimedia-operations with something like

Post-deploy checks

Restarting

In case you need to restart Proton without any deployments (for example, to reload mediawiki configs from config or other deployments),

  • Restart proton hosts, from deployment.eqiad.wmnet (production) or deployment-deploy01.deployment-prep.eqiad.wmflabs (beta)
    cd /srv/deployment/proton/deploy && scap deploy --service-restart

When something goes wrong

Reverting a Proton deployment

Code

ssh deployment.eqiad.wmnet
cd /srv/deployment/proton/deploy
scap deploy --rev <sha>

Target machines

If you need to check something on the target machines:

  • Beta Cluster:
    • deployment-chromium01.deployment-prep.eqiad.wmflabs
  • Production:
    • proton1001.eqiad.wmnet
    • proton1002.eqiad.wmnet
    • proton2001.codfw.wmnet
    • proton2002.codfw.wmnet

Proton uses port 24766 in Beta and production.

Updating Puppeteer & Chromium

We use a very small set of Puppeteer features and usually, it is pretty safe to update both the Puppeteer library and the Chromium browser. Before you start updating Pupeteer and Chromium, please keep in mind that versions of Puppeteer and Chromium are tighly coupled:

Puppeteer acts as an indivisible entity with Chromium. Each version of Puppeteer bundles a specific version of Chromium – the only version it is guaranteed to work with. This is not an artificial constraint: A lot of work on Puppeteer is actually taking place in the Chromium repository.

For more information please refer to Why doesn’t Puppeteer v.XXX work with Chromium v.YYY?

Puppeteer updates

We pin puppeteer to a specific version in the package.json file. The latest Puppeteer version can be found on Puppeteer releases page. The update process is very simple and it narrows down to bumping the puppeteer version in the package.json file, running npm install to fetch new version and testing that service renders PDF correctly. Puppeteer is usually shipped with not-yet-stable version for Chrome, there is no need to update the Puppeteer with every release. Because Puppeteer is coupled with specific Chromium version - the Puppeteer updates should be performed only when the new version provides useful features/fixed issues related to HTML/PDF rendering.

Chromium updates

We decided to use Chromium bundled with the operating system as this approach sounded like a most reasonable solution. The Chromium packages in Debian (OS we're using) are verified by Debian maintainers and are guaranteed to work and not have any destructive behavior.

We analyzed other ways to ship Chromium, but they were rejected:

  • using Chromium version bundled with Puppeteer - This was rejected due to fact that Puppeteer downloads the chromium from some servers and we do not have control over it. There was no safe way to verify that downloaded version is safe to use in WMF environment.
  • store Chromium executable in the Proton repository - This was rejected due to the size of chromium executable. It's over 100MB. The chromium-render repository would grow too fast and it would become pretty difficult to maintain in near future.
  • installing chromium manually (or via some script) - This was rejected due to higher maintenance cost. The Chromium version shipped with Debian is proven to work properly with the Puppeteer version we're currently using.

When you decide to update or Puppeteer, or Chromium browser you should pick the version of Puppeteer that uses the Chromium version (or vice versa) close enough to the one bundled with given Puppeteer version. We cannot update Chromium that often as Debian release cycle is bit slow and the bundled Chromium version is not the latest stable.

Puppeteer configuration

Wikimedia environment is very specific and it requires special puppet configuration. We need to pass additional config options that is very difficult explain why, as those can look like security loopholes:

  • --ignoreHTTPSErrors flag was introduced because we use a self-signed certificate for our internal wiki domains (since the CA is our Puppet), and using internal domains is the standard way of accessing MediaWiki appservers from REST services. Given that Proton cannot communicate with the outside world, and even if it receives malicious HTML, it should be able to handle it safely, it is safe to use the `ignoreHTTPSErrors` configuration flag. This config is set only on production environment (in deploy repo). The chromium-render repository doesn't have that option set.
  • --no-sandbox and --disable-setuid-sandbox flags are required to properly execute Chromium inside docker environment. Chromium sandboxing requires kernel user namespaces set up properly. You can find more information about the issue on Chrome won't work without --no-sandbox option issue. Chrome process is firejailed which means is already sandboxed by us and there is no need to use built-in chrome sandboxing.
  • --font-rendering-hinting=medium, --enable-font-antialiasing, --disable-gpu flags are used to tune up the fonts rendering. We want consistent fonts rendering across all production/staging/beta and development platforms.
  • --hide-scrollbars and --no-first-run flags are used to improve rendering PDF page. Most probably they are not required, but it is safer to keep then on

Data flow

Proton flow.png