You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Thumbor: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Filippo Giunchedi
imported>Gilles
Line 1: Line 1:
There is an [[phab:T111718|ongoing effort]] to rewrite our production thumbnailing infrastructure around [http://thumbor.org/ Thumbor] servers. The objective is to ultimately stop storing thumbnails in Swift, to remove the need to run mediawiki on image scalers and to generally have a more maintainable platform.
As of June 2017, the Wikimedia production (and beta) media thumbnailing infrastructure is based on [http://thumbor.org/ Thumbor]. All thumbnail traffic for public wikis is served by it.


This new Thumbor service is meant to be stateless and only concern itself with transforming images.
''Migrating private WMF wikis to Thumbor is the subject of a [[phab:T169144|follow-up project]].''


Currently the new Thumbor-based stack is being developed on Vagrant. Also deployed in production in eqiad (thumbor1001.eqiad.wmnet, thumbor1002.eqiad.wmnet), fetching originals from swift and storing thumbnails back into swift with the <tt>thumbor/</tt> prefix.
== Rationale ==
* '''Better support''' Thumbor has a lively community of its own, and is a healthy open-source project. In contrast, the media-handling code in Mediawiki is supported on a best-effort basis by very few people
* '''Better security isolation''' Thumbor is stateless and connects to Swift, Poolcounter and DC-local Thumbor-specific Memcache instances (see "Throttling" below). In contrast, Mediawiki is connected to many more services, as well as user data and sessions. Considering how common security vulnerability discoveries are in media-processing software, it makes sense to isolate media thumbnailing as much as possible.
* '''Better separation of concerns''' Thumbor only concerns itself with thumbnail generation. This is desirable in a service-oriented architecture.
* '''Easier operations''' Thumbor is a simple service and should be easy to operate.


After that, we will need to figure out a replacement for Swift, in all likelihood an LRU cache that will run on its own cluster and won't store more than one copy of each thumbnail. Thumbor will be pointed to that instead of Swift to store thumbnails and we can decommission the storage of thumbnails in Swift.
== Supported file types ==
We have written [[phab:diffusion/THMBREXT/browse/master/wikimedia_thumbor/engine/|Thumbor engines]] for all the file formats used on Wikimedia wikis (JPG, PNG, GIF, TIFF, XCF, SVG, PDF, DJVU, WEBM, OGV, STL).


=== File types ===
These engines reuse the same logic as Mediawiki to render those images, often leveraging the same underlying open-source libraries or executables. Whenever possible, reference images generated with Mediawiki are used for the Thumbor integration tests.
Adapters for all the file formats used on Commons have been written for Thumbor (JPG, PNG, GIF, TIFF, XCF, SVG, PDF, DJVU, WEBM, OGV).


=== Originals ===
== Broader ecosystem ==
Thumbor reads its originals directly from Swift and only caches them in memory for the duration of the thumbnail's request, in case a concurrent request on the same Thumbor server wants the same original (which is particularly useful in prerendering/warming situations). After the request is done, it doesn't keep any copy of the original anywhere.
In order to understand Thumbor's role in our software stack, one has to understand how Wikimedia production is currently serving those images.


=== Thumbnails ===
The edge, where user requests first land, is Varnish. Most requests for a thumbnail are a hit on the Varnish frontend or backend caches.
Thumbor stores thumbnails on Swift, using the same location scheme as image scalers. It doesn't cache nor store thumbnails locally.


=== Purging ===
When Varnish can't find a copy of the requested thumbnail - whether it's a thumbnail that has never been requested before, or ones that fell out of Varnish cache - Varnish hits the Swift proxies. We run [[phab:source/operations-puppet/browse/production/modules/swift/files/SwiftMedia/wmf/rewrite.py|a custom plugin on our Swift proxies]], which is responsible for parsing the thumbnail URL, determining whether there is a copy of that thumbnail already stored in Swift, serving it if that's the case, asking Thumbor to generate it otherwise.
Thumbor is currently stateless and doesn't need to concern itself with purges.
 
When Thumbor receives a request, it tries to fetch the original media from Swift. If it can't, it 404s. It then proceeds to generate the request thumbnail for that media. Once it's done, it serves the resulting image, which the Swift proxy then forwards to Varnish, which serves it to the client. Varnish saves a copy in its own cache, and Thumbor saves a copy in Swift.
 
== Thumbnail quality ==
Thumbor comes with a few settings defining the quality of thumbnails. We have a few Wikimedia-specific ones on top of that. This is mimicking settings found in Mediawiki.
 
=== Chroma subsampling ===
JPG thumbnails generated by Thumbor use a specific chroma subsampling value defined in the <code>CHROMA_SUBSAMPLING</code> Thumbor config variable, found in Puppet.
 
=== qlow ===
Wikimedia thumbnails can be requested with a special parameter lowering the compression quality on purpose (to be served to clients with low bandwidth, typically). The compression quality used for those thumbnails is defined in the <code>QUALITY_LOW</code> Thumbor config variable, found in Puppet.
 
=== Conditional sharpening ===
Historically, since Wikimedia wikis have been consistent in making sure that JPGs are photographs and diagrams are uploaded as other filetypes, we have been able to visually optimize JPGs for photographs. This manifests itself with a conditional sharpening logic, supported by a custom Thumbor plugin. This plugin can be applied to any file type (it really just passes the information to the engine, which has to apply it), and we apply it to JPG originals by default, via the DEFAULT_FILTERS_JPEG Thumbor config variable, found in Puppet. It defines the sharpening value to be applied, as well as the resize ratio that acts as a threshold to apply the sharpening or not.
 
This technique allows resized JPGs to be more visually pleasing, with the edge details being more pronounced when JPGs are drastically resized.
 
== EXIF processing ==
In order to make JPG thumbnails lighter, we reduce the size of the EXIF payload included in thumbnail images.
 
=== EXIF field filtering ===
We strip EXIF data, but in order to conserve attribution information we keep a few fields in thumbnails. The list of which is defined by the <code>EXIF_FIELDS_TO_KEEP</code> Thumbor config variabe, found in Puppet.
 
=== ICC profile substitution ===
We replace sRGB ICC profiles with Facebook's [https://www.facebook.com/notes/facebook-engineering/under-the-hood-improving-facebook-photos/10150630639853920/ TinyRGB profile], which achieves the same visual results with a much smaller payload. This mechanism is governed by the <code>EXIF_TINYRGB_PATH</code> and <code>EXIF_TINYRGB_ICC_REPLACE</code> Thumbor config variables, found in Puppet.
 
== Throttling ==
In order to prevent abuse and to distribute server resources more fairly, Thumbor has a few throttling mechanisms in place. These happen as early as possible in the request handling, in order to avoid unnecessary work.
 
=== Memcached-based ===
Failure throttling require having a memory of past events. For this we use [[Memcached]]. In order to share the throttling information across Thumbor instances, we use a local [[nutcracker]] instance running on each Thumbor server, pointing to all the Thumbor servers in a given datacenter. This is configured in Puppet, with the list of servers in hiera under the <code>thumbor_memcached_servers</code> and <code>thumbor_memcached_servers_nutcracker</code> config variables.
 
In Thumbor's configuration, the memcached settings used for this are defined in  <code>FAILURE_THROTTLING_MEMCACHE</code> and <code>FAILURE_THROTTLING_PREFIX</code>, found in Puppet.
 
==== Failure ====
The failure throttling logic itself is governed by the <code>FAILURE_THROTTLING_MAX</code> and <code>FAILURE_THROTTLING_DURATION</code> Thumbor config variables. This throttling limits retries on failing thumbnails. Some originals are broken or can't be rendered by our thumbnailing software and there would be no point retrying them every time we encounter them. This limit allows us to avoid rendering problematic originals for a while. We don't want to blacklist them permanently, however, as upgrading media-handling software might suddenly make originals that previously couldn't be rendered start working. This limit having an expiry guarantees that the benefits of upgrades apply naturally to problematic files, without requiring to clear a permanent blacklist whenever software is upgraded on the Thumbor hosts.
 
=== Poolcounter-based ===
For other forms of throttling, we use [[PoolCounter|Poolcounter]]. Both to combat malicious of unintentional DDoS, as well as regulate resource consumption. The Poolcounter server configuration shared by the different throttling types is defined in the <code>POOLCOUNTER_SERVER,</code> <code>POOLCOUNTER_PORT</code> and <code>POOLCOUNTER_RELEASE_TIMEOUT</code> Thumbor config variables, found in Puppet.
 
==== Per-IP ====
We limit the amount of concurrent thumbnail generation requests per client IP address. The configuration for that throttle is governed by the and <code>POOLCOUNTER_CONFIG_PER_IP</code> Thumbor config variable, found in Puppet.
 
==== Per-original ====
We limit the amount of concurrent thumbnail generation requests per original media. The configuration for that throttle is governed by the and <code>POOLCOUNTER_CONFIG_PER_ORIGINAL</code> Thumbor config variable, found in Puppet.
 
==== Expensive ====
Some thumbnail types are disproportionately expensive to render thumbnails for (in terms of CPU time, mostly). Those expensive types are subject to an extra throttle, defined by the <code>POOLCOUNTER_CONFIG_EXPENSIVE</code> Thumbor config variable, found in Puppet.
 
'''Not per-user'''
 
Unlike Mediawiki, Thumbor doesn't implement a per-user Poolcounter throttle. First because Thumbor has greater isolation (on purpose) and doesn't have access to any user data, including sessions. Secondly, the per-IP throttle should covers the same ground, as logged-in users should have little IP address variance during a session.
 
== Logging ==
Thumbor logs go to <code>/srv/log/thumbor</code> on the Thumbor servers. All the Thumbor instances on a given server write to the same files. Logs are rotated daily. The logging configuration is defined in Puppet, under the <code>THUMBOR_LOG_CONFIG</code> Thumbor config variable.
 
Thumbor logs also go to [[Logstash|Logstash/Kibana]].
 
== Configuration ==
Thumbor consumes its configuration from the <code>/etc/thumbor.d/</code> folder. The .conf files found in that folder are parsed in alphabetical order by Thumbor. The <code>thumbor</code> Debian package as well as our custom <code>python-thumbor-wikimedia</code> Debian package contain default configuration files. On top of which we add some defined in Puppet.
 
The rule of thumb here is that configuration that might depend on the instance or datacenter at hand should be defined in Puppet, while configuration that won't vary per machine can be defined in the <code>python-thumbor-wikimedia</code> Debian package.
 
== Writing Thumbor plugins ==
 
== Deploying changes ==


== Operations ==
== Operations ==
=== Dashboards ===
[https://grafana.wikimedia.org/dashboard/db/thumbor Grafana thumbor dashboard]
[https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=thumbor&var-instance=All Thumbor Prometheus Eqiad breakdown]
=== Restarting ===
  cumin -b1 -s10 'thumbor1*' 'systemctl restart thumbor-instances'
  cumin -b1 -s10 'thumbor2*' 'systemctl restart thumbor-instances'


=== Debian package ===
=== Debian package ===
Thumbor is deployed via Debian packages, specifically <code>python-thumbor-wikimedia</code> contains WMF extensions to process additional file types, talk to swift and so on. The repository with <code>debian/</code> directory lives at <code>operations/debs/python-thumbor-wikimedia</code> while "upstream" repository is at <code>https://phabricator.wikimedia.org/diffusion/THMBREXT/</code>.
Thumbor is deployed via Debian packages, specifically <code>python-thumbor-wikimedia</code> contains WMF extensions to process additional file types, talk to Swift and so on. The repository with <code>debian/</code> directory lives at <code>operations/debs/python-thumbor-wikimedia</code> while "upstream" repository is at <code>https://phabricator.wikimedia.org/diffusion/THMBREXT/</code>.


Assuming <code>debian/changelog</code> has been updated, it is possible to build a new package by first tagging <code>upstream/VERSION</code> the relevant commit and then <code>git-buildpackage -us -uc -S</code> to create a <code>.orig.tar.gz</code> and <code>dpkg-source -b .</code> to create the source package. Once the source package is available it can be built with <code>BACKPORTS=yes DIST=jessie-wikimedia  sudo -E cowbuilder --debbuildopts -sa --build DSC_FILE</code> and uploaded to <code>apt.wikimedia.org</code>
Assuming <code>debian/changelog</code> has been updated, it is possible to build a new package by first tagging <code>upstream/VERSION</code> the relevant commit and then <code>git-buildpackage -us -uc -S</code> to create a <code>.orig.tar.gz</code> and <code>dpkg-source -b .</code> to create the source package. Once the source package is available it can be built with <code>BACKPORTS=yes DIST=jessie-wikimedia  sudo -E cowbuilder --debbuildopts -sa --build DSC_FILE</code> and uploaded to <code>apt.wikimedia.org</code>

Revision as of 16:50, 28 September 2017

As of June 2017, the Wikimedia production (and beta) media thumbnailing infrastructure is based on Thumbor. All thumbnail traffic for public wikis is served by it.

Migrating private WMF wikis to Thumbor is the subject of a follow-up project.

Rationale

  • Better support Thumbor has a lively community of its own, and is a healthy open-source project. In contrast, the media-handling code in Mediawiki is supported on a best-effort basis by very few people
  • Better security isolation Thumbor is stateless and connects to Swift, Poolcounter and DC-local Thumbor-specific Memcache instances (see "Throttling" below). In contrast, Mediawiki is connected to many more services, as well as user data and sessions. Considering how common security vulnerability discoveries are in media-processing software, it makes sense to isolate media thumbnailing as much as possible.
  • Better separation of concerns Thumbor only concerns itself with thumbnail generation. This is desirable in a service-oriented architecture.
  • Easier operations Thumbor is a simple service and should be easy to operate.

Supported file types

We have written Thumbor engines for all the file formats used on Wikimedia wikis (JPG, PNG, GIF, TIFF, XCF, SVG, PDF, DJVU, WEBM, OGV, STL).

These engines reuse the same logic as Mediawiki to render those images, often leveraging the same underlying open-source libraries or executables. Whenever possible, reference images generated with Mediawiki are used for the Thumbor integration tests.

Broader ecosystem

In order to understand Thumbor's role in our software stack, one has to understand how Wikimedia production is currently serving those images.

The edge, where user requests first land, is Varnish. Most requests for a thumbnail are a hit on the Varnish frontend or backend caches.

When Varnish can't find a copy of the requested thumbnail - whether it's a thumbnail that has never been requested before, or ones that fell out of Varnish cache - Varnish hits the Swift proxies. We run a custom plugin on our Swift proxies, which is responsible for parsing the thumbnail URL, determining whether there is a copy of that thumbnail already stored in Swift, serving it if that's the case, asking Thumbor to generate it otherwise.

When Thumbor receives a request, it tries to fetch the original media from Swift. If it can't, it 404s. It then proceeds to generate the request thumbnail for that media. Once it's done, it serves the resulting image, which the Swift proxy then forwards to Varnish, which serves it to the client. Varnish saves a copy in its own cache, and Thumbor saves a copy in Swift.

Thumbnail quality

Thumbor comes with a few settings defining the quality of thumbnails. We have a few Wikimedia-specific ones on top of that. This is mimicking settings found in Mediawiki.

Chroma subsampling

JPG thumbnails generated by Thumbor use a specific chroma subsampling value defined in the CHROMA_SUBSAMPLING Thumbor config variable, found in Puppet.

qlow

Wikimedia thumbnails can be requested with a special parameter lowering the compression quality on purpose (to be served to clients with low bandwidth, typically). The compression quality used for those thumbnails is defined in the QUALITY_LOW Thumbor config variable, found in Puppet.

Conditional sharpening

Historically, since Wikimedia wikis have been consistent in making sure that JPGs are photographs and diagrams are uploaded as other filetypes, we have been able to visually optimize JPGs for photographs. This manifests itself with a conditional sharpening logic, supported by a custom Thumbor plugin. This plugin can be applied to any file type (it really just passes the information to the engine, which has to apply it), and we apply it to JPG originals by default, via the DEFAULT_FILTERS_JPEG Thumbor config variable, found in Puppet. It defines the sharpening value to be applied, as well as the resize ratio that acts as a threshold to apply the sharpening or not.

This technique allows resized JPGs to be more visually pleasing, with the edge details being more pronounced when JPGs are drastically resized.

EXIF processing

In order to make JPG thumbnails lighter, we reduce the size of the EXIF payload included in thumbnail images.

EXIF field filtering

We strip EXIF data, but in order to conserve attribution information we keep a few fields in thumbnails. The list of which is defined by the EXIF_FIELDS_TO_KEEP Thumbor config variabe, found in Puppet.

ICC profile substitution

We replace sRGB ICC profiles with Facebook's TinyRGB profile, which achieves the same visual results with a much smaller payload. This mechanism is governed by the EXIF_TINYRGB_PATH and EXIF_TINYRGB_ICC_REPLACE Thumbor config variables, found in Puppet.

Throttling

In order to prevent abuse and to distribute server resources more fairly, Thumbor has a few throttling mechanisms in place. These happen as early as possible in the request handling, in order to avoid unnecessary work.

Memcached-based

Failure throttling require having a memory of past events. For this we use Memcached. In order to share the throttling information across Thumbor instances, we use a local nutcracker instance running on each Thumbor server, pointing to all the Thumbor servers in a given datacenter. This is configured in Puppet, with the list of servers in hiera under the thumbor_memcached_servers and thumbor_memcached_servers_nutcracker config variables.

In Thumbor's configuration, the memcached settings used for this are defined in FAILURE_THROTTLING_MEMCACHE and FAILURE_THROTTLING_PREFIX, found in Puppet.

Failure

The failure throttling logic itself is governed by the FAILURE_THROTTLING_MAX and FAILURE_THROTTLING_DURATION Thumbor config variables. This throttling limits retries on failing thumbnails. Some originals are broken or can't be rendered by our thumbnailing software and there would be no point retrying them every time we encounter them. This limit allows us to avoid rendering problematic originals for a while. We don't want to blacklist them permanently, however, as upgrading media-handling software might suddenly make originals that previously couldn't be rendered start working. This limit having an expiry guarantees that the benefits of upgrades apply naturally to problematic files, without requiring to clear a permanent blacklist whenever software is upgraded on the Thumbor hosts.

Poolcounter-based

For other forms of throttling, we use Poolcounter. Both to combat malicious of unintentional DDoS, as well as regulate resource consumption. The Poolcounter server configuration shared by the different throttling types is defined in the POOLCOUNTER_SERVER, POOLCOUNTER_PORT and POOLCOUNTER_RELEASE_TIMEOUT Thumbor config variables, found in Puppet.

Per-IP

We limit the amount of concurrent thumbnail generation requests per client IP address. The configuration for that throttle is governed by the and POOLCOUNTER_CONFIG_PER_IP Thumbor config variable, found in Puppet.

Per-original

We limit the amount of concurrent thumbnail generation requests per original media. The configuration for that throttle is governed by the and POOLCOUNTER_CONFIG_PER_ORIGINAL Thumbor config variable, found in Puppet.

Expensive

Some thumbnail types are disproportionately expensive to render thumbnails for (in terms of CPU time, mostly). Those expensive types are subject to an extra throttle, defined by the POOLCOUNTER_CONFIG_EXPENSIVE Thumbor config variable, found in Puppet.

Not per-user

Unlike Mediawiki, Thumbor doesn't implement a per-user Poolcounter throttle. First because Thumbor has greater isolation (on purpose) and doesn't have access to any user data, including sessions. Secondly, the per-IP throttle should covers the same ground, as logged-in users should have little IP address variance during a session.

Logging

Thumbor logs go to /srv/log/thumbor on the Thumbor servers. All the Thumbor instances on a given server write to the same files. Logs are rotated daily. The logging configuration is defined in Puppet, under the THUMBOR_LOG_CONFIG Thumbor config variable.

Thumbor logs also go to Logstash/Kibana.

Configuration

Thumbor consumes its configuration from the /etc/thumbor.d/ folder. The .conf files found in that folder are parsed in alphabetical order by Thumbor. The thumbor Debian package as well as our custom python-thumbor-wikimedia Debian package contain default configuration files. On top of which we add some defined in Puppet.

The rule of thumb here is that configuration that might depend on the instance or datacenter at hand should be defined in Puppet, while configuration that won't vary per machine can be defined in the python-thumbor-wikimedia Debian package.

Writing Thumbor plugins

Deploying changes

Operations

Dashboards

Grafana thumbor dashboard

Thumbor Prometheus Eqiad breakdown

Restarting

 cumin -b1 -s10 'thumbor1*' 'systemctl restart thumbor-instances'
 cumin -b1 -s10 'thumbor2*' 'systemctl restart thumbor-instances'

Debian package

Thumbor is deployed via Debian packages, specifically python-thumbor-wikimedia contains WMF extensions to process additional file types, talk to Swift and so on. The repository with debian/ directory lives at operations/debs/python-thumbor-wikimedia while "upstream" repository is at https://phabricator.wikimedia.org/diffusion/THMBREXT/.

Assuming debian/changelog has been updated, it is possible to build a new package by first tagging upstream/VERSION the relevant commit and then git-buildpackage -us -uc -S to create a .orig.tar.gz and dpkg-source -b . to create the source package. Once the source package is available it can be built with BACKPORTS=yes DIST=jessie-wikimedia sudo -E cowbuilder --debbuildopts -sa --build DSC_FILE and uploaded to apt.wikimedia.org

Using manhole

Thumbor runs with python manhole for debugging/inspection purposes. See also T146143: Figure out a way to live-debug running production thumbor processes

To invoke manhole, e.g. on thumbor on port 8827:

 sudo -u thumbor socat - unix-connect:/srv/thumbor/tmp/thumbor@8827/manhole-8827