You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

SRE/Observability/Ownership: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Jobo
mNo edit summary
imported>LMata
m (Added primary categories and roadmaps)
 
Line 2: Line 2:


{| class="wikitable" width=100%
{| class="wikitable" width=100%
!<big>Service</big>
!Backlog
!<big>Category</big>
!<big>Services</big>
!<big>Description</big>
!<big>Description</big>
!<big>Phabricator tag</big>
!<big>Phabricator tag</big>
!<big>Notes</big>
|-
|-
|Alerting
|
|All things alerting, including AlertManager, Icinga and Splunk-On-Call
|[[phab:project/view/5394/|#observability-alerting]]
|-
|Metrics
|
|Aggregatable metrics systems and their interfaces such as Prometheus, Thanos, Grafana Graphite
|[[phab:project/view/5392/|#observability-metrics]]
|-
|Logging
|
|The log pipeline, logstash, and opensearch ecosystem
|[[phab:project/view/5393/|#observability-logging]]
|-
|Incident Tooling
|
|Incident workflow-related tooling, such as dispatch and any other related systems.
|[[phab:project/view/6098/|#incident_tooling, #observability-ir-tools]]
|-
|Tracing
|
|This is not developed yet but is in future plans, distributed tracing support.
|[[phab:project/view/5395/|#observability-tracing]]
|-
|
|[[Prometheus]]
|[[Prometheus]]
|
|Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.
|Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.
|
|
|-
|
|
|-
|[[Graphite]]
|[[Graphite]]
|
|Graphite is a real-time time series data store and graph renderer.  
|Graphite is a real-time time series data store and graph renderer.  
|https://phabricator.wikimedia.org/tag/graphite/
|https://phabricator.wikimedia.org/tag/graphite/
|-
|
|
|-
|[[Alertmanager]]
|[[Alertmanager]]
|
|Alertmanager is the service (and software) in charge of collecting, de-duplicating and sending notifications for alerts across WMF infrastructure. It is part of the Prometheus ecosystem and therefore Prometheus itself has native support to act as Alertmanager client. The alerts dashboard, implemented by Karma, can be reached at https://alerts.wikimedia.org/  
|Alertmanager is the service (and software) in charge of collecting, de-duplicating and sending notifications for alerts across WMF infrastructure. It is part of the Prometheus ecosystem and therefore Prometheus itself has native support to act as Alertmanager client. The alerts dashboard, implemented by Karma, can be reached at https://alerts.wikimedia.org/  
|
|
|
|-
|-
|}

Latest revision as of 12:51, 15 September 2022

Backlog Services Description Phabricator tag
Alerting All things alerting, including AlertManager, Icinga and Splunk-On-Call #observability-alerting
Metrics Aggregatable metrics systems and their interfaces such as Prometheus, Thanos, Grafana Graphite #observability-metrics
Logging The log pipeline, logstash, and opensearch ecosystem #observability-logging
Incident Tooling Incident workflow-related tooling, such as dispatch and any other related systems. #incident_tooling, #observability-ir-tools
Tracing This is not developed yet but is in future plans, distributed tracing support. #observability-tracing
Prometheus Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.
Graphite Graphite is a real-time time series data store and graph renderer. https://phabricator.wikimedia.org/tag/graphite/
Alertmanager Alertmanager is the service (and software) in charge of collecting, de-duplicating and sending notifications for alerts across WMF infrastructure. It is part of the Prometheus ecosystem and therefore Prometheus itself has native support to act as Alertmanager client. The alerts dashboard, implemented by Karma, can be reached at https://alerts.wikimedia.org/