You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Monitoring/strongswan

From Wikitech-static
Jump to navigation Jump to search

IPsec connections are monitored today using a combination of prometheus-ipsec-exporter and icinga.

A per-site alert will fire via Icinga check_prometheus should one or more defined IPsec tunnels change to disconnected, unknown, or other non-connected state.

Troubleshooting an issue

  1. Review the alert description text carefully. You should see some hint about where the problem lies here.
    1. e.g. alert text of "instance=cp1081:9536 site=eqiad tunnel={cp3060_v4,cp3060_v6}" indicates that there is a problem with the tunnels to cp3060 being reported by host cp1081.
  2. Check the IPsec Grafana dashboard
    1. Investigate any ongoing non-zero values.
    2. Look for commonality, e.g. do multiple instances report problems with the same tunnel?
  3. Write more about what you did to troubleshoot the issue in this runbook :)
  4. Tell the traffic team about it