You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Maps/Runbook: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
 
imported>Gehel
(→‎Known issues: added note about current OSM reimport)
 
Line 27: Line 27:


== Known issues ==
== Known issues ==
=== Reimport in progress ===
There are currently issues with OSM replication, those are under investigation, but at the moment replication is disabled and servers are lagging behind. There might be Icinga expired downtimes related to this ("Maps - OSM synchronization lag"). See {{phab|T249086}}.


=== Tilerator Crashed ===
=== Tilerator Crashed ===

Latest revision as of 08:22, 4 May 2020

Intro

The maps service consists of Kartotherian - a nodejs service to serve map tiles, Tilerator - a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage, and TileratorUI - an interface to manage Tilerator jobs. Please see more detailed description in the Maps service. This "runbook" goal is to provide short instructions of what to do in case of issue with the service.

Components

Maps service component diagram
Maps service deployment diagram

The maps service consists of:

  • Kartotherian - a nodejs service to serve map tiles
  • Tilerator - a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage
  • TileratorUI - an interface to manage Tilerator jobs (Note: maintained as part of the Tilerator repo)

There are eight servers in the maps group: maps100{1,2,3,4}.eqiad.wmnet and maps200{1,2,3,4}.codfw.wmnet. Each of these runs Kartotherian (port 6533, NCPU instances), Tilerator (port 6534, half of NCPU instances), and TileratorUI (port 6535, 1 instance). Maps traffic is routed through the cache misc varnish clusters.

Main dashboards

Known limitations

Known issues

Reimport in progress

There are currently issues with OSM replication, those are under investigation, but at the moment replication is disabled and servers are lagging behind. There might be Icinga expired downtimes related to this ("Maps - OSM synchronization lag"). See task T249086.

Tilerator Crashed

Tilerator endpoint might fail to respond due to a number reasons. This usually happens with icinga alerts like this:

1:19 AM PROBLEM - tilerator on maps1003 is CRITICAL: connect to address 10.64.32.117 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1002 is CRITICAL: connect to address 10.64.16.42 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1001 is CRITICAL: connect to address 10.64.0.79 and port 6534: Connection refused

Remediation

  • Restart tilerator on the affected servers.

Services error at startup

Tilerator and Kartotherian might fail during startup with the error Cannot read property 'length' of undefined, see following stacktrace:

{"name":"kartotherian","hostname":"deployment-maps04","pid":38,"level":60,"err":{"message":"","name":"TypeError","stack":"Type
Error: Cannot read property 'length' of undefined\n    at /srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265
cac3229d321b3823c25/node_modules/tilelive-tmstyle/index.js:180:56\n    at Array.map (native)\n    at /srv/deployment/kartother
ian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/tilelive-tmstyle/index.js:179:34\n    at Babel.han
dler.getInfo (/srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/@kartotheri
an/core/lib/sources.js:260:7)\n    at /srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/
node_modules/tilelive-tmstyle/index.js:118:20\n    at tryCatcher (/srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672d
d90c8265cac3229d321b3823c25/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise.successAdapter (/srv/deployment/ka
rtotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/bluebird/js/release/nodeify.js:23:30)\n   
at Promise._settlePromise (/srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_module
s/bluebird/js/release/promise.js:566:21)\n    at Promise._settlePromiseCtx (/srv/deployment/kartotherian/deploy-cache/revs/270
62b4e8672dd90c8265cac3229d321b3823c25/node_modules/bluebird/js/release/promise.js:606:10)\n    at _drainQueueStep (/srv/deploy
ment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/bluebird/js/release/async.js:142:12)
\n    at _drainQueue (/srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/blu
ebird/js/release/async.js:131:9)\n    at Async._drainQueues (/srv/deployment/kartotherian/deploy-cache/revs/27062b4e8672dd90c8
265cac3229d321b3823c25/node_modules/bluebird/js/release/async.js:147:5)\n    at Immediate.Async.drainQueues (/srv/deployment/k
artotherian/deploy-cache/revs/27062b4e8672dd90c8265cac3229d321b3823c25/node_modules/bluebird/js/release/async.js:17:14)\n    a
t runCallback (timers.js:672:20)\n    at tryOnImmediate (timers.js:645:5)\n    at processImmediate [as _immediateCallback] (ti
mers.js:617:5)","levelPath":"fatal/service-runner/unhandled"},"msg":"Cannot read property 'length' of undefined","time":"2018-
12-13T19:02:20.635Z","v":0}

Remediation

See also