You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This page is for the project to implement an OpenStreetMap TileServer hosted in WMF's production infrastructure which serves the same basic data as OSM themselves (synced from them). This is orthogonal to (or in some other sense, a prerequisite for) other OSM efforts going on in Labs for related overlays and databases. Corrections to the information below are welcome, this is all to the best of my knowledge at this time...
2013-10-07 - Still working on deployment/puppetization -related things
There seem to be multiple drivers for this project. The main one from WMF Engineering's perspective is that we want a reliable and robust tileserver to serve tiles for use in our mobile apps. Also, once this tileserver is available, it can be used by the Toolforge community as well.
The obvious alternative is using OSM directly from our pages/mobile. The principle rationales I've heard for this solution over sending the traffic straight to OSM seem to be:
- HTTPS Support required
- Load - our intended use (including mobile) of the map tiles amounts to excessive load for OSM to handle themselves
The primary goal for this project is to implement a wikipedia-scale OSM tileserver. The data will be the normal public OSM dataset available at e.g. http://planet.openstreetmap.org/ . We'll additionally support HTTPS, and obviously we'll adhere to our privacy policies, and scale the solution out to meet our demands. There will be a single set of standard tiles in standard image format. Once we have that running in production, we can start gathering more data on scaling, user activity, community needs, etc and develop further plans, some of which may entail significant upgrades to the architecture as warranted by need and allowed by the costs involved.
Specifics, where known or relevant:
- capacity to handle requests from Wikipedia (OSM-Gadget, WikiMiniAtlas) , Wikidata, Commons, Wikimaps, Wikivoyage and Mobile
- Expectation are going from 200 Tiles/sec (What we have now on Toolserver) to 30000 Tiles per sec
- hstore data for tagging ? (not sure, this was left out of earlier scope discussion)
Timeline / Plan
Basic investigative work is mostly complete. The steps forward from here (with rough timeline guesses):
- Settle on a software stack and scaling arrangement that makes sense and has been basically tested to be functional (end of August, very early Sept?)
- Given current requirements, I expect this to avoid shared infrastructure (e.g. Ceph). We'll simply replicate the osm2pgsql-level updates to separate read-only PostGIS copies on each tileserver, and have them each run invalidation and disk-caching separately. Varnish will balance queries using a consistent-hash map for the tile URLs that takes into account how meta-tiles are structured to mostly avoid cache redundancy. Some of the lowest zoom levels that we pre-render might get pre-rendered everywhere so that those can be distributed evenly to all available tileservers. Still lots of details to sort out here...
- Puppetize the configuration and begin testing the deployment within the production infrastructure (roughly mid-late Sept, I imagine some of this will happen while we're all in SFO)
- Set up any special access and/or DB replication for labs
- Start using it in production
The software stack looks something like:
Varnish -> Apache -> mod_tile -> (renderd || Tirex) (-> FS-cache || -> Mapnik -> PostGIS)
- If you use renderd rather than Tirex, it has better support for scaling out already built in, if I am not mistaken (about Tirex). E.g. renderd has support for using ceph/rados as a tile storage and support to load balance rendering requests properly over multiple servers. apmon (talk) 18:49, 8 August 2013 (UTC)
- Updated that above (Tirex choice was arbitrary at the time). Been discussing with Faidon a bit about the FS cache, and I'm not (yet) convinced we need shared storage for that. I think we could do a consistent-hash map at the varnish layer that splits the load geographically and do local FS caches per backend machine that have some locality-of-reference (only end up rendering a certain geographic subset at the deep zoom levels, mostly). - Shared FS cache is not necessary and likely simplifies things if you leave it out. Indeed osm.org has just gone that route with operating two entirely independent rendering servers. The main advantage of a shared FS cache is probably load balancing of the rendering requests. This way you can have a single "master queue" which then optimally distributes rendering requests among the available rendering servers. This way you don't need to worry about what load patterns you have and if your hash function properly distributes them, which may even change throughout the day. However, if you have a fine enough interleaving this might not a big issue. While designing the hash function, do remember though that things get rendered in "metatiles" i.e. 8x8 tile blocks. The second advantage is that you can independently scale IO load for tile serving and CPU load for rendering. But if your varnish cashing storage in front of the tileservers is big enough (i.e. have a very high hit rate), that again might be less of an issue. If you don't go the shared cache route, then tirex might not be a bad choice. Through its plug-in system, it has greater flexibility. With it, it is easier to do things like geojson/vector tiles ( like http://a.www.toolserver.org/tiles/vtile/0/0/0.js ) which can be useful for client side rendering ( e.g. http://kothic.org/js/ ) or clickable features. apmon (talk) 02:22, 15 August 2013 (UTC)
- https://gerrit.wikimedia.org/r/#/c/36222/ (Current set of puppet scripts for single server setup. I have some patches somewhere to extend them to multi-server setups as well)
- https://github.com/apmon/OSM-rendering-stack-deplou/tree/master/wikipedia/debian ( Debian packages / packaging scripts for the necessary OSM Tileserver components. Needs updates and some cleaning up )
- Sumana's post to maps-l explaining where we are.
Community requirements and general wishlist items that won't be in the initial deployment, but are a consideration for the future:
- Support of different styles that are popular on Toolserver
- OSM-default, OSM without labels, hikebikemap
- Support for topo-data (hillshading),
- satellite imagery (Earth + NASA stuff)
- http://www.opengeoserver.at/ (supported by WMDE)
- Later support for Vector-Tiles for Mapnik and client-side
- Could help reduce load, bandwidth, etc, if sufficient clients support it
- Support of multilingual maps
- Support for OpenHistoricalMap database (for Wikimaps and WIWOSM)
- (? Some support for OSM notes (to report map bugs) and other means to help users to fix things; this is not strictly tileserver but it's general interface and it's a good netizen's behaviour, we can help that.)
Extended State of Things - 2013-11-27
Larger issues with the scope and constraints of this project
- Hardware is already bought and isn't right:
- The systems are configured in a traditional 3-tier setup - if we're really going to run postgresql queries in prod for on-demand render of high-zoom tiles, it probably makes more sense to run a local copy of the postgres database on each render node rather than splitting off a central db server or two that they query remotely.
- That aside, the storage on the database nodes, after RAID10 + XFS setup, leaves us with ~540GB of usable space data. A basic import of current planet data consumes about 80% of this. Even if we don't add other features, this is problematic. We'll likely run out of space just processing updates and seeing normal database space growth issues, and we'll lack the free space to move data around in common administrative operations. Even if the space usage remained static, 80% allocation is probably bad for filesystem performance. We could buy more/bigger disks, or we could look at not mirroring and relying on a postgresl failover adapter for the production read queries when a db node fails due to random disk issues.
- Even leaving those things aside, that brings us around to:
- Really, direct PostgreSQL queries for rendering aren't going to scale well
- It's pointless to deploy something just to say we've done so if it's going to fail to scale up for WP-level user loads when we actually try to use it - especially if fixing the scaling means re-thinking and re-architecting the whole thing anyways.
- The best way to scale would be to render the whole database to vector tiles. The software setup for this is an unknown to me (and I'm guessing most who aren't directly involved), but the big problem is it will probably involve a whole lot more disk space and a very different hardware layout than what we're doing for PostgreSQL-based queries. We'd still have a PG database, but the PG database's role in the stack would be to process the osmosis updates and serve as a source for a constant batch job that exports vector tiles, and then the image renderer for user queries would render from vector tiles to image tiles. With sufficient disk space, we can scale out the vector tile accesses easily, whereas no amount of disk space is going to fix Postgis queries. I expect that getting this set up will take some serious time and community involvement in hashing out the software stack.
- No matter what avenue we go down here, managing a tile server infrastructure is going to be a large ongoing megaproject.
- This isn't a small, well-understood piece of software we can deploy and forget until something breaks. It's a big complex thing that's going to need a lot of ongoing human involvement and micro-management in production. I don't think there's any way to just automate/puppetize this away into a corner and make it super-easy to manage.
- Arguably it needs a team behind it if you're going to scale it up and do it reliably, and handle all the exception cases. Part of this is that the software stack isn't mature enough to hide its own complexity and make it something that can be installed and run in simple-sysadmin-mode by anyone. It's great software, but it needs handholding. Part of this is also all the unknowns in keeping the pipeline of data updates and style/coast alterations flowing. Another big piece of the puzzle is doing data administration when it comes to data features (e.g. hikebikemap, etc). It's going to get very complicated...
Basic technical notes from setting up a simple install
setup for db/render nodes on wmf Ubuntu Precise: (also, c.f. MaxSem's earlier single-node puppet commit here: https://gerrit.wikimedia.org/r/#/c/36222/ - which is still further along than any of this and apparently didn't work out either):
- Precise has some packages available that work, and others have to come from Kai's PPA ultimately ( https://launchpad.net/~kakrueger/+archive/openstreetmap )
- We can rebuild the binaries for 3 base packages from the PPA to import into our repo on brewster (to avoid direct PPA dependency, control local versions, avoid needing proxy to install, etc):
- Changes: default to false on dl-coastlines in the .template file, remove suggest/recommend for libapache2-mod-tile and osm2pgsql
- osm2pgsql - no changes
- libapache2-mod-tile - remove suggest/recommend for osm2pgsql and postgis-db-setup
- (note, all the changes above can also be done through apt options and dselect, but since we're rebuilding anyways it just makes life easier to change the defaults)
- Render nodes:
- apt installs: libapache2mod-tile, openstreetmap-mapnik-stylesheet-data, and mapnik-utils (from Ubuntu) + all their dependencies
- use http proxy on brewster (http_proxy="http://brewster.wikimedia.org:8080") for coastline downloading after package install
- Database nodes:
- set up raid via bios as RAID10 - gives us ~560GB usable.
- After basic OS stuff, bulk of space as XFS filesystem on /db
- apt installs: mapnik-utils, osm2pgsql, openstreetmap-postgis-db-setup (which will also pull in postgis, etc)
- relocate postgres data into /db (shut down, mv, symlink of /var/lib/postgresql/9.1/main to /db/main)
- Postgres DB config (current params differing from default):
- shared_buffers = 256MB
- checkpoint_segments = 20
- maintenance_work_mem = 512MB
- autovacuum = off
- kernel shmmax 536870912 (currently /etc/sysctl.d/99-temp-postgres.conf)
- pull down latest planet data from http://planet.openstreetmap.org/planet/ via wget, using env var http_proxy="http://brewster.wikimedia.org:8080"
- import into postgres via a command like:
- su www-data -c 'osm2pgsql --create -s -d gis -C 32768 --cache-strategy=dense --unlogged --number-processes=8 /db/planet-131106.osm.bz2'
- (this took approximately 20 hours last time, but that was with nprocs=4 - nprocs=8 may help - other tuning is possible I'm sure)
- osmosis setup for timely data sync:
- apt-get install osmosis
- mkdir /db/osmosis
- cd /db/osmosis
- osmosis --rrii (initializes)
- create state.txt to match planet import based on stamp + http://osm.personalwerk.de/replicate-sequences/?Y=2013&m=11&d=05&H=19&i=10&s=15&stream=minute#
- osmosis --rri workingDirectory=/db/osmosis/ --sc --wpc user=www-data database=gis
- is java-based and needs proxy via env var: JAVACMD_OPTIONS="-Dhttp.proxyHost=brewster.wikimedia.org -Dhttp.proxyPort=8080"
- currently, crashes (probably a simple fix? this is where I last left off) with: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [SELECT version FROM schema_info]; nested exception is org.postgresql.util.PSQLException: ERROR: relation "schema_info" does not exist
Ok, Brandon, thanks for the report. I'm going to try an be as polite as I can here, forgive me if I fail. We have a running setup on the toolserver. It is in production on WMF sites. You may not be familiar with the extent of current mapping tool deployment. You may want to use the holidays to catch up on the OSM gadget and the meta:WikiMiniAtlas. I have a hard time believing that you or the WMF cannot do better than the toolserver. That admission would be nothing short of an embarrassment. I as a mapping data user (WMA) frankly feel f%^&&%ed over in the a&&. There is no other way to put it. If the toolserver gets shut down because of labs, the way I see it is that the WMF has a duty to provide adequate replacement. TS is going down in about 7 months. This is not much time, especially in hindsight if we look at what was accomplished on labs for the maps projects in the last 7 months ("nothing" would be hyperbole, but it is surprisingly little). I'm not posting here to blame anyone, I'm posting with a plea for an attitude change! It is very frustrating to read above a statement that essentially says "this won't scale up, let's forget about it". This is not in the spirit of us toolserver hackers. We have spent a considerable amount of our free unpaid time to build tools that we are now about to see going down the toilet. Please shift down a gear and forget about your scalability ideas for a month or two, and get us a system that will at least allow us to move our stuff over before it is dead! F&^%$ high performance for now. We need the bare minimum to migrate and have anything. The alternative is that we will have no mapping tools at all! Is that not clear? There is a saying in German that goes "perfection" is the enemy of "good enough. That's what I'm seeing here. Brandon, I know that you jumped on this train after it was already running (actually after it was already promised to have reached its destination) so please don't take this as a completely personal criticism (just take an appropriate amount of responsibility ;-). This is a very frustrating situation for me as a volunteer tool developer. --Dschwen (talk) 17:37, 28 November 2013 (UTC)
Let's try and keep this as much on the technical level as possible. I too think that the worries are a bit overblown, and looking at the various other tile server setups and how the current "in production system" on the toolserver works, which currently is deployed in a good fraction of the wikipedia sites, it is going to be much less of an issue than you think. However, we need to find a way forward that can demonstrate that the scaling issues indeed aren't going to be that much of an issue, or if Brandon turns out to be correct, to get some solid empirical evidence of exactly where the scaling bottlenecks are going to be and how they manifest them selves. Without that solid data, it will be hard for me or other developers of the tool stack to improve the software stack where it actually matters. I will try and write a more detailed point by point response to the issues you have raised in the next couple of days in order to try and find a way forward. But first of all happy Thanksgiving to all (in the US). apmon (talk) 18:10, 28 November 2013 (UTC)