You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Incident documentation/2021-06-15 Eqsin network

From Wikitech-static
< Incident documentation
Revision as of 12:17, 13 July 2021 by imported>Ema
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

document status: in-review

Summary

At 09:23 UTC, alerts indicated connectivity issues to the Eqsin cluster in Singapore. At 09:31 UTC, @Ema deployed a DNS change to depool the Eqsin cluster. This diverted most of its assigned traffic to Ulsfo, and some to Esams. At 09:35 UTC traffic started recovering, with traffic back to regular levels at 09:45 UTC. The 15-minute window is attributed to DNS caches expiring (e.g. at ISPs and on client devices). The connectivy issues were resolved later that day, and at 18:50 UTC @CMooney repooled the Eqsin cluster, with traffic back to regular levels in Eqsin by 19:00 UTC.

Impact: For about 35 minutes from 09:20 to 09:45 UTC, the wikis were largely unreachable from countries normally served by the Singapore DC (including India, Hong Kong, and Japan).

Documentation:

Actionables