You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Maps: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Yurik
imported>NPRB
m (Fixed Typo. (changed me to be))
 
(35 intermediate revisions by 14 users not shown)
Line 1: Line 1:
This page describes the technical aspects of deploying [[mw:Maps|Maps service]] on Wikimedia Foundation infrastructure.
This page describes the technical aspects of deploying [[mw:Maps|Maps service]] on Wikimedia Foundation infrastructure.  
 
'''The service is being actively redesigned, documentation can be found under [[Maps/v2]]'''


== Intro ==
== Intro ==
The maps service consists of [https://github.com/kartotherian/kartotherian/blob/master/README.md Kartotherian] - a nodejs service to serve map tiles, [https://github.com/kartotherian/tilerator/blob/master/README.md Tilerator] - a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage, and TileratorUI - an interface to manage Tilerator jobs. There are four servers in the <code>maps</code> group: <code>maps-test200{1,2,3,4}.codfw.wmnet</code> that run Kartotherian (port 6533, NCPU instances), Tilerator (port 6534, half of NCPU instance), TileratorUI (port 6535, 1 instance). Also, there are two Varnish servers in the <code>cache_maps</code> group: <code>cp104{3,4}.eqiad.wmnet</code>.
[[File:Maps-components.png|thumb|Maps service component diagram]]
[[File:Maps-deployment.png|thumb|Maps service deployment diagram]]
[[File:Maps_@_PI@2x.png|thumb|Kartotherian internals focus]]
 
The maps service consists of [https://github.com/kartotherian/kartotherian/blob/master/README.md Kartotherian] - a nodejs service to serve map tiles, [https://github.com/kartotherian/tilerator/blob/master/README.md Tilerator] - a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage, and TileratorUI - an interface to manage Tilerator jobs. There are 20 servers in the <code>maps</code> group: <code>maps20[01-10].codfw.wmnet</code> and <code>maps10[01-10].eqiad.wmnet</code> that run Kartotherian (port 6533, NCPU instances), Tilerator (port 6534, half of NCPU instance), TileratorUI (port 6535, 1 instance). Also, there are four Varnish servers per datacenter in the <code>cache_maps</code> group.
 
== The infrastructure ==
* [[Maps/OSM Database|OSM Database]]
* [[Maps/Tile_storage|Tile storage]]
* [[Maps/Kartotherian|Kartotherian]]
* [[Maps/Tilerator|Tilerator]]
* [[Maps/Maintenance|Maintenance]]
 
== Miscellaneous ==
* [https://wikitech.wikimedia.org/wiki/Maps/Dynamic_tile_sources Dynamic tile sources]
* [[Maps/Debugging]]
 
== Development processes ==
* [[Maps/Services_deployment|Kartotherian/Tilerator deployment]]
* [[Maps/Kartotherian packages|Kartotherian/Packages]]
 
== Puppetization and Automation ==
=== Prerequisites ===
* passwords and postgres replication configuration is set in Ops private repo (<code>root@puppetmaster1001:/srv/private/hieradata/role/(codfw|eqiad)/maps/server.yaml</code>)
* other configuration in <code>puppet/hieradata/role/(codfw|common|eqiad)/maps/*.yaml</code>
* <code>cassandra::rack</code> is defined in <code>puppet/hieradata/hosts/maps*.yaml</code>
* the <code>role::maps::master</code> / <code>role::maps::slave</code> roles are associated to the maps nodes (site.pp)


== Monitoring ==
== Monitoring ==
* [https://grafana.wikimedia.org/dashboard/db/interactive-team-kpi KPI dashboard]
* [http://discovery.wmflabs.org/maps/ Usage dashboard]
* [http://discovery.wmflabs.org/maps/ Usage dashboard]
* [https://grafana.wikimedia.org/dashboard/db/service-maps-varnish Usage - varnish]
* [https://grafana.wikimedia.org/dashboard/db/service-maps-varnish Usage - varnish]
Line 12: Line 41:
* [https://grafana.wikimedia.org/dashboard/db/service-tilerator Tilerator - Grafana]
* [https://grafana.wikimedia.org/dashboard/db/service-tilerator Tilerator - Grafana]
* [https://logstash.wikimedia.org/#/dashboard/elasticsearch/tilerator Tilerator - Logstash]
* [https://logstash.wikimedia.org/#/dashboard/elasticsearch/tilerator Tilerator - Logstash]
* [https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Maps%2520caches%2520eqiad&tab=m&vn=&hide-hf=false Ganglia - Maps Varnish cluster] (eqiad)
* [https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Maps+Cluster+codfw&tab=m&vn=&hide-hf=false&sh=1 Ganglia - Maps cluster] (codfw)
* [https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Maps+Cluster+codfw&h=&tab=m&vn=&hide-hf=false&m=disk_free&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name Free disk space]
== Importing database ==
'''maps2001 is actually not the best server for this - we should switch it around with maps2002, as it has 12 cores and 96GB RAM.
* From http://planet.osm.org/pbf/ - find the file with the latest available date, but do NOT use "latest", as that might change at any moment.
* <tt><nowiki>curl -x webproxy.eqiad.wmnet:8080 -O http://planet.osm.org/pbf/planet-151214.osm.pbf.md5</nowiki></tt>
* <tt><nowiki>curl -x webproxy.eqiad.wmnet:8080 -O http://planet.osm.org/pbf/planet-151214.osm.pbf</nowiki></tt>
* <tt>md5sum -c planet-151214.osm.pbf.md5</tt>
* <tt>PGPASSWORD="$(< ~/osmimporter_pass)" osm2pgsql --create --slim --flat-nodes nodes.bin -C 40000 --number-processes 8 --hstore planet-151214.osm.pbf -H maps-test2001 -U osmimporter -d gis</tt>
=== Notes ===
* Tables are created by osm2pgsql, no need for an initial DDL script.
== [https://github.com/kartotherian/kartotherian/blob/master/README.md Kartotherian] ==
Kartotherian servers map tiles by getting vector data from Cassandra, applying the [https://github.com/kartotherian/osm-bright.tm2/blob/master/README.md style] to it, and returning raster images. It is also capable of serving a "static image" - a map with a given width/height/scaling/zoom, and can server vector tiles directly for on-the-client rendering (WebGL maps).
* [Https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/kartotherian/templates/config.yaml.erb Service configuration template] (extends [[phab:diffusion/OPUP/browse/production/modules/service/templates/node/config.yaml.erb|base template]])
* [http://git.wikimedia.org/blob/maps%2Fkartotherian.git/HEAD/sources.prod.yaml Sources configuration]
To see the tiles without Varnish cache, connect to Kartotherian using an ssh tunnel, e.g. <code>ssh -L 6533:localhost:6533 maps-test2001.codfw.wmnet</code> and browse to http://localhost:6533
== [https://github.com/kartotherian/tilerator/blob/master/README.md Tilerator] ==
Tilerator is a backend vector tile pre-generation service that picks up jobs from a Redis job que, copying tiles from a Postgres DB, using [https://github.com/kartotherian/osm-bright.tm2source/blob/master/README.md sql queries] into vector tiles stored in Cassandra. Postgres DBs are set up on each of the maps hosts, one master and 3 slaves. Technically, Tilerator is not even a generator, but rather a "batch copying" service, which takes tiles from one configured source (e.g. a tile generator from SQL), and puts it into another source (e.g. Cassandra tile store).
* [Https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/tilerator/templates/config.yaml.erb Service configuration template] (extends [[phab:diffusion/OPUP/browse/production/modules/service/templates/node/config.yaml.erb|base template]])
* [http://git.wikimedia.org/blob/maps%2Ftilerator.git/HEAD/sources.prod.yaml Sources configuration]
== TileratorUI ==
TileratorUI is used to add jobs to the Tilerator job queue. Actually, TileratorUI is the same code as Tilerator, but started with a different configuration. Connect to TileratorUI using an ssh tunnel, e.g. <code>ssh -L 6535:localhost:6535 maps-test2001.codfw.wmnet</code> and navigating to http://localhost:6535. There, you can view any style (use set style to change it), or to schedule a job by setting all relevant fields and Control+Clicking the tile you want to schedule.
See full [https://github.com/kartotherian/tilerator/blob/master/README.md Tilerator documentation] for all commands & parameters.
=== Dynamic Tile Sources ===
==== Cassandra ====
To create a new Cassandra data source, post something like this to the /sources as a text body. Default table name is <code>tiles</code>. If table or keyspace is not there, you have to use <code>createIfMissing</code> parameter.<syntaxhighlight lang="yaml">
v2a:
  uri: cassandra://
  params:
    keyspace: v2
    table: tiles2a
    cp: [maps-test2001.codfw.wmnet, maps-test2002.codfw.wmnet, maps-test2003.codfw.wmnet, maps-test2004.codfw.wmnet]
    username: {var: cassandra-user}
    password: {var: cassandra-pswd}
#    repfactor: 4
#    durablewrite: 0
#    createIfMissing: true
</syntaxhighlight>
==== Dynamic Layer Generator ====
To generate just a few layers from database, create a layer filter and a layer mixer:<syntaxhighlight lang="yaml">
gentmp:
  uri: bridge://
  xml:
    npm: ["osm-bright-source", "data.xml"]
  xmlSetDataSource:
    if:
      dbname: gis
      host: ""
      type: postgis
    set:
      host: localhost
      user: {var: osmdb-user}
      password: {var: osmdb-pswd}
  xmlLayers: [admin, road]
mixtmp:
  uri: layermixer://
  params:
    sources: [{ref: v2}, {ref: gentmp}]
</syntaxhighlight>Once set, POST a job to copy <code>mixtmp</code> into the storage <code>v2</code> e.g.
'''<code>src=mixtmp dst=v2 baseZoom=0 fromZoom=5 beforeZoom=6 parts=10</code>'''
=== Generating Tiles ===
Generate all tiles for zooms <code>0..7</code>, using generator <code>gen</code>, saving into <code>v3</code> everything including the solid tiles, up to 4 jobs per zoom.
'''<code>src=gen dst=v3 parts=4 baseZoom=0 fromZoom=0 beforeZoom=8 saveSolid=1</code>'''
Generated tiles only if they already exist in <code>v2</code> source, and save them into <code>v3</code>, on zooms <code>8..15</code>, 60 jobs per zoom.
'''<code>src=gen dst=v3 parts=60 baseZoom=0 fromZoom=8 beforeZoom=16 sourceId=v2</code>'''
=== Bulk Copying ===
The fastest way to copy a large number of tiles from one source to another is to use a large number of parts and specify <code>saveSolid=true</code> (skips solid tile detection). E.g. to copy all z16 tiles from v2 to v3, use<blockquote>'''<code>src=v2 dst=v3 zoom=16 parts=60 saveSolid=true</code>''' </blockquote>


== Postgres ==
== Subpages ==
* Clear the Postgres data directory and init the database from backup (replace <code>maps2001.codfw.wmnet</code> by the postgres master):
{{Special:PrefixIndex/{{PAGENAME}}/|hideredirects=1|stripprefix=1}}
<code>rm -rf /srv/postgresql/9.4/main/* && sudo -u postgres pg_basebackup -X stream -D /srv/postgresql/9.4/main/ -h maps2001.codfw.wmnet -U replication -W</code>

Latest revision as of 18:09, 16 March 2023

This page describes the technical aspects of deploying Maps service on Wikimedia Foundation infrastructure.

The service is being actively redesigned, documentation can be found under Maps/v2

Intro

Maps service component diagram
Maps service deployment diagram
Error creating thumbnail: File with dimensions greater than 12.5 MP
Kartotherian internals focus

The maps service consists of Kartotherian - a nodejs service to serve map tiles, Tilerator - a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage, and TileratorUI - an interface to manage Tilerator jobs. There are 20 servers in the maps group: maps20[01-10].codfw.wmnet and maps10[01-10].eqiad.wmnet that run Kartotherian (port 6533, NCPU instances), Tilerator (port 6534, half of NCPU instance), TileratorUI (port 6535, 1 instance). Also, there are four Varnish servers per datacenter in the cache_maps group.

The infrastructure

Miscellaneous

Development processes

Puppetization and Automation

Prerequisites

  • passwords and postgres replication configuration is set in Ops private repo (root@puppetmaster1001:/srv/private/hieradata/role/(codfw|eqiad)/maps/server.yaml)
  • other configuration in puppet/hieradata/role/(codfw|common|eqiad)/maps/*.yaml
  • cassandra::rack is defined in puppet/hieradata/hosts/maps*.yaml
  • the role::maps::master / role::maps::slave roles are associated to the maps nodes (site.pp)

Monitoring

Subpages