You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
< Incident documentationJump to navigation Jump to search
Revision as of 08:34, 25 October 2016 by (Created page with "== Summary == Between 18:50 UTC and 19:20 UTC, October 21st, maps.wikimedia.org stopped rendering tiles due to Cassandra backend being unavailable. == Timeline == * 18:50 UTC...")
Between 18:50 UTC and 19:20 UTC, October 21st, maps.wikimedia.org stopped rendering tiles due to Cassandra backend being unavailable.
- 18:50 UTC: cassandra wrongly reinitialized on maps2004.codfw.wmnet, deleting all cassandra data on maps2004. Kartotherian starts failing with
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level LOCAL_ONE.
- 19:20 UTC: traffic redirected to maps eqiad cluster, user traffic is served again without error
- 19:40 UTC: full deployment of new traffic configuration
- 21:13 UTC: permissions are reset on maps/cassandra codfw cluster, kartotherian starts working again on the codfw clsuter
- The main trigger for this is human error.
- maps/cassandra has a replication factor of 1 on the "system_auth" keyspace. This means that loosing one node potentially breaks authentication.
- increase replication factor on system_auth keyspace task T149074