You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Incident documentation/2018-11-06 maps

From Wikitech-static
Jump to navigation Jump to search

This is a template for an Incident Report. Replace notes with your own description. Save the incident report as a subpage of the Incident documentation, named in this style: YYYYMMDD-$NameOfService.

Summary

Tilerator failed on maps100[1-3]. Tilerator is a non-public service to prepare vector tiles (data blobs) from OSM database into Cassandra storage. This happened on the 6th November 2018. Icinga first reported this failure around 00:18 UTC.

Timeline

This is a step by step outline of what happened to cause the incident and how it was remedied.

00:18 UTC: Icinga reported failure of Tilerator ports :

PROBLEM - tilerator on maps1003 is CRITICAL: connect to address 10.64.32.117 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1002 is CRITICAL: connect to address 10.64.16.42 and port 6534: Connection refused
1:19 AM PROBLEM - tilerator on maps1001 is CRITICAL: connect to address 10.64.0.79 and port 6534: Connection refused

07:25 UTC: Tilerator Service was restarted on maps100[1-3]

07:26 UTC: Tilerator Service came back up.

Conclusions

Links to relevant documentation

Maps Runbook: Maps/RunBook

Actionables

Explicit next steps to prevent this from happening again as much as possible, with Phabricator tasks linked for every step.

NOTE: Please add the #wikimedia-incident Phabricator project to these follow-up tasks and move them to the "follow-up/actionable" column.