You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Alertmanager: Difference between revisions
imported>Filippo Giunchedi (Add production deployment diagram) |
imported>Filippo Giunchedi (Sketch notifications section) |
||
Line 13: | Line 13: | ||
* [https://github.com/google/alertmanager-irc-relay alertmanager-irc-relay] forwards alerts to IRC channels | * [https://github.com/google/alertmanager-irc-relay alertmanager-irc-relay] forwards alerts to IRC channels | ||
*[[git:operations/debs/prometheus-icinga-exporter|prometheus-icinga-exporter]] compatibility shim to forward active Icinga alerts to Alertmanager, also provides Prometheus-style metrics for Icinga | *[[git:operations/debs/prometheus-icinga-exporter|prometheus-icinga-exporter]] compatibility shim to forward active Icinga alerts to Alertmanager, also provides Prometheus-style metrics for Icinga | ||
== Notifications == | |||
As of Jan 2021, Alertmanager supports the following notification methods: | |||
* email - sent by Alertmanager itself | |||
* IRC - via the <code>jinxer-wm</code> bot on Freenode | |||
* phabricator - through [[phab:p/phaultfinder/|@phaultfinder]] user | |||
* pages - sent via Splunk Oncall (formerly known as VictorOps) | |||
Notification preferences are set per-team and are based on the alert' severity (respectively the <code>team</code> and <code>severity</code> labels attached to the alert) |
Revision as of 13:14, 26 January 2021
What is it?
Alertmanager is the service (and software) in charge of collecting, de-duplicating and sending notifications for alerts across WMF infrastructure. It is part of the Prometheus ecosystem and therefore Prometheus itself has native support to act as Alertmanager client. The alerts dashboard, implemented by Karma, can be reached at https://alerts.wikimedia.org/. As of Jan 2021 the dashboard is available for SSO users only, however a read-only version is possible as well.
Alertmanager is being progressively rolled out as the central place where all alerts are sent, the implementation is done in phases according to the alerting infrastructure roadmap. As of Jan 2021 LibreNMS has been fully migrated, with more services to come.
Software stack
When talking about the Alertmanager stack as a whole it is useful to list its components as deployed at Wikimedia Foundation, namely the following software is:
- Alertmanager the daemon actually in charge of handling alerts and sending out notifications
- Karma the dashboard/UI for Alertmanager alerts, it powers https://alerts.wikimedia.org
- kthxbye implements the "acknowledgement" functionality for alerts
- alertmanager-irc-relay forwards alerts to IRC channels
- prometheus-icinga-exporter compatibility shim to forward active Icinga alerts to Alertmanager, also provides Prometheus-style metrics for Icinga
Notifications
As of Jan 2021, Alertmanager supports the following notification methods:
- email - sent by Alertmanager itself
- IRC - via the
jinxer-wm
bot on Freenode - phabricator - through @phaultfinder user
- pages - sent via Splunk Oncall (formerly known as VictorOps)
Notification preferences are set per-team and are based on the alert' severity (respectively the team
and severity
labels attached to the alert)