You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Monitoring/strongswan: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
No edit summary
imported>BCornwall
m (Pedantic ipsec spelling)
 
Line 1: Line 1:
[[Wikipedia:IPSec|IPSec]] connections are monitored today using a combination of prometheus-ipsec-exporter and icinga.
[[Wikipedia:IPsec|IPsec]] connections are monitored today using a combination of prometheus-ipsec-exporter and icinga.


A per-site alert will fire via icinga check_prometheus should one or more defined IPsec tunnels change  to disconnected, unknown, or other non-connected state.
A per-site alert will fire via Icinga check_prometheus should one or more defined IPsec tunnels change  to disconnected, unknown, or other non-connected state.


# troubleshoot the issue
== Troubleshooting an issue ==
## Review the alert description text carefully.  You should see some hint about where the problem lies here.
# Review the alert description text carefully.  You should see some hint about where the problem lies here.
### e.g. alert text of "instance=cp1081:9536 site=eqiad tunnel={cp3060_v4,cp3060_v6}" indicates that there is a problem with the tunnels to cp3060 being reported by host cp1081.
## e.g. alert text of "instance=cp1081:9536 site=eqiad tunnel={cp3060_v4,cp3060_v6}" indicates that there is a problem with the tunnels to cp3060 being reported by host cp1081.
## Check the ipsec dashboard at https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status?orgId=1
# Check the [https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status?orgId=1 IPsec Grafana dashboard]
### investigate any ongoing non-zero values.
## Investigate any ongoing non-zero values.
### Look for commonality, e.g. do multiple instances report problems with the same tunnel?   
## Look for commonality, e.g. do multiple instances report problems with the same tunnel?   
# Write more about what you did to troubleshot the issue in this runbook :)
# Write more about what you did to troubleshoot the issue in this runbook :)
# Tell the traffic team about it
# Tell the traffic team about it


[[Category:Runbooks|Strongswan]]
[[Category:Runbooks|Strongswan]]

Latest revision as of 17:56, 15 June 2022

IPsec connections are monitored today using a combination of prometheus-ipsec-exporter and icinga.

A per-site alert will fire via Icinga check_prometheus should one or more defined IPsec tunnels change to disconnected, unknown, or other non-connected state.

Troubleshooting an issue

  1. Review the alert description text carefully. You should see some hint about where the problem lies here.
    1. e.g. alert text of "instance=cp1081:9536 site=eqiad tunnel={cp3060_v4,cp3060_v6}" indicates that there is a problem with the tunnels to cp3060 being reported by host cp1081.
  2. Check the IPsec Grafana dashboard
    1. Investigate any ongoing non-zero values.
    2. Look for commonality, e.g. do multiple instances report problems with the same tunnel?
  3. Write more about what you did to troubleshoot the issue in this runbook :)
  4. Tell the traffic team about it