You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This page is currently a draft.
More information and discussion about changes to this draft on the talk page.
Utilization Saturation Errors (USE)
This method is most effective to quickly diagnose any system performance issue. To quote Brendan Gregg's guide to USE:
For every resource, check utilization, saturation, and errors.
The host overview dashboard shows and example of this method applied to inspect a single host's performance. Resources (CPU/network/etc) are placed in rows, the left column is used for the resource's utilization, while the right column displays saturation or errors, as applicable.
Four golden signals (4GS)
This method is described in detail in Google's SRE book and focuses on the system's user-impacting metrics. Specifically it can be used as a basis for alerting and diagnosis of ongoing problems.
Data panel recommendations
- Axes must be labeled
- Y axis should be zero-based
- Use fill zero, unless the graph is stacked
- Ideally no more than four lines/metrics per panel