You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

SRE/Observability/Dashboard guidelines: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Jobo
mNo edit summary
imported>Krinkle
No edit summary
 
Line 24: Line 24:
* Ideally no more than four lines/metrics per panel
* Ideally no more than four lines/metrics per panel


== See also ==
* [[Performance/Runbook/Grafana best practices]]


[[Category:SRE Observability]]
[[Category:SRE Observability]]

Latest revision as of 20:02, 29 June 2022

Dashboard methods

Utilization Saturation Errors (USE)

This method is most effective to quickly diagnose any system performance issue. To quote Brendan Gregg's guide to USE:

 For every resource, check utilization, saturation, and errors.

The host overview dashboard shows and example of this method applied to inspect a single host's performance. Resources (CPU/network/etc) are placed in rows, the left column is used for the resource's utilization, while the right column displays saturation or errors, as applicable.

Four golden signals (4GS)

This method is described in detail in Google's SRE book and focuses on the system's user-impacting metrics. Specifically it can be used as a basis for alerting and diagnosis of ongoing problems.

This method can be seen applied to swift for example or sessionstore or any other service dashboard in the "Services" Grafana folder.

Data panel recommendations

  • Axes must be labeled
  • Y axis should be zero-based
  • Use fill zero, unless the graph is stacked
  • Ideally no more than four lines/metrics per panel

See also