You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Performance/Runbook/RUM/Alert: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Phedenskog
(In the making)
 
imported>Phedenskog
(Fix)
Line 4: Line 4:


*Issue tracker (Phabricator): [[phab:tag/NavigationTiming/|NavigationTiming]]
*Issue tracker (Phabricator): [[phab:tag/NavigationTiming/|NavigationTiming]]
*Documentation: [[Performance/?|?]]
*Documentation: [https://www.mediawiki.org/wiki/Extension:NavigationTiming|Navigation Navigation Timing Extension]


=== RUM performance regression alert ===
===RUM performance regression alert===
Independent if what alert that fires (first paint, response start, load event end)
We alert on three different performance metrics: ''first paint'', ''response start'' and ''load event end''. Make sure that you use the alerting metrics in the dashboards when you try find out more about the regression.
 
Use these steps to try get more information about the regression: 
 
#Identify when the regression started, so you can use that time stamp when looking in other dashboards. You should be able to see when it happened in the [https://grafana.wikimedia.org/d/000000326/navigation-timing-alerts navigation timing alert dashboard]
#Is the regression on desktop or mobile or both? Check the [https://grafana.wikimedia.org/d/000000038/navigation-timing-by-platform navigation metrics by platform dashboard]
#Is the regression caused by one browser or by a specific browser version? Check the [https://grafana.wikimedia.org/d/000000218/navigation-timing-by-browser navigation timing by browser dashboard]
#Do we get more or less metrics than before? Check the [https://grafana.wikimedia.org/d/000000143/navigation-timing?viewPanel=12&orgId=1 report rate by metric dashboard]
#Can we see the regression using our synthetic tools? Look at the [https://grafana.wikimedia.org/d/000000057/webpagetest-drilldown WebPageTest dashboard] and the [https://grafana.wikimedia.org/d/000000282/webpagereplay-drilldown WebPageReplay dashboard]. If we can see the issue also on WebPageReplay we know it's a front end regression.
#Is the regression caused by a code change? Use time time stamp/ time span when you think the regression started and check the [[Server Admin Log|server admin log]].
Create a [https://phabricator.wikimedia.org Phabricator task] and include everything you know. Please take screenshots of the dashboards and include links. If you could identify the code change that caused the change, please include the team/person in the issue.

Revision as of 07:56, 29 October 2021

This is the runbook for RUM alerts (real user measurements performance alerts).

Meta

RUM performance regression alert

We alert on three different performance metrics: first paint, response start and load event end. Make sure that you use the alerting metrics in the dashboards when you try find out more about the regression.

Use these steps to try get more information about the regression:

  1. Identify when the regression started, so you can use that time stamp when looking in other dashboards. You should be able to see when it happened in the navigation timing alert dashboard
  2. Is the regression on desktop or mobile or both? Check the navigation metrics by platform dashboard
  3. Is the regression caused by one browser or by a specific browser version? Check the navigation timing by browser dashboard
  4. Do we get more or less metrics than before? Check the report rate by metric dashboard
  5. Can we see the regression using our synthetic tools? Look at the WebPageTest dashboard and the WebPageReplay dashboard. If we can see the issue also on WebPageReplay we know it's a front end regression.
  6. Is the regression caused by a code change? Use time time stamp/ time span when you think the regression started and check the server admin log.

Create a Phabricator task and include everything you know. Please take screenshots of the dashboards and include links. If you could identify the code change that caused the change, please include the team/person in the issue.