You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
We have different tools to find performance regressions and we have a automated alerts that will fire if they suspect there is a regression. When an alert is fired, we need to find out the cause of the regression. There are two different types of performance alerts: synthetic testing and real user measurements. Synthetic testing can find smaller regressions by analyzing a video recording of the screen (but only tests a few use cases) and real user measurements find larger regressions reported using browsers performance APIs that affects many users.
You got a performance alert, what's the next step?
You want to understand what's causing the regression: Is it a code change, is it something in the environment, is it a new browser version or has something changed in the toolchain measuring performance?
The first thing I do is to try and find out if the regression is across the board (for all URLs, all browsers, all synthetic tools, both synthetic and RUM metrics). If you know that, you are on the way to finding the root cause of the problem.
Synthetic testing alerts typically reference WebPageTest or WebPageReplay. For example:
Notification Type: PROBLEM Service: https://grafana.wikimedia.org/dashboard/db/webpagereplay-mobile-alerts grafana alert Host: einsteinium Address: 18.104.22.168 State: CRITICAL Date/Time: Tue Sept 11 22:14:46 UTC 2018 Notes URLs: Additional Info: CRITICAL: https://grafana.wikimedia.org/dashboard/db/webpagereplay-mobile-alerts is alerting: Rendering Mobile enwiki CPU alert.
Notification Type: PROBLEM Service: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts grafana alert Host: einsteinium Address: 22.214.171.124 State: CRITICAL Date/Time: Thu Sept 13 04:12:19 UTC 2018 Notes URLs: https://phabricator.wikimedia.org/T203485 Additional Info: CRITICAL: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts is alerting: Start Render Chrome Desktop [ALERT] alert.
We run two different synthetic testing tools to find regressions: WebPageTest includes network/server time, Browsertime/WebPageReplay focuses exclusively on front end performance. We run WebPageTest for English Wikipedia (desktop and mobile) and Browsertime/WebPageReplay for English, Swedish, French, Dutch, German, Spanish, Japanese, Chinese, Russian, beta, group 0 and group 1 (desktop and mobile).
You can read more about the WebPageReplay alerts to get the understanding on what we test.
If the alert comes from WebPageTest, you can start by checking the generic WebPageTest dashboard: https://grafana.wikimedia.org/dashboard/db/webpagetest and then go down and check the metrics for the individual URL: https://grafana.wikimedia.org/dashboard/db/webpagetest-drilldown
If the alert is coming from WebPageReplay/Browsertime you should start with the generic dashboard: https://grafana.wikimedia.org/dashboard/db/webpagereplay and then check each URL https://grafana.wikimedia.org/dashboard/db/webpagereplay-drilldown
Where to start
A good starting point is to find out at what point in time the regression was introduced. If you can find that, then you can compare screenshots and HAR files (that describes what and when the browser downloads assets) before and after the regression.
To find specific runs in WebPageTest, you need to use the search page. It will show a lot of runs so make sure you pick the right ones!
A couple of things to know: Make sure you choose Show tests from all users and Do not limit the number of results (warning, WILL be slow). That way you are sure you will see all the tests. Also change the View to include enough days to go back to when the regression happened.
You can also the fields or URLs containing and try to limit the result.
It's important that you get the run before and after the regression within the same search result, because you can use the small checkbox to the left of the results to pick runs. It's usually a lot of work to just find the right run so have patience. When you've picked to runs, then click the (small) Compare button.
When you click "compare", you will see a comparison of the waterfall chart (using the HAR) and screenshots and videos for the selected runs.
Some things to look for:
- Are there assets that are being downloaded after the regression, that were not being downloaded before it?
- Are there specific assets that are downloading slowly?
- Has anything visible changed on the page? (For example, we frequently have alerts fire when fundraising campaigns start, and we sometimes see alerts when an edit is made to a page that changes it significantly.)
To find specific runs, you need do to go to the storage where we store all data for all the runs. The easiest way to do that is to use the Grafana dashboard.
In the drop downs, make sure you pick the wiki, device, browser, latency and page you want to compare.
When the page has refreshed itself, the links to the storage have been updated. Check to the right of the dashboard and you will see a screenshot of the page and two links: Latest run and Older runs. If the latest run includes the regression (the regression is still ongoing) you can click that link and a compare page will open with all the metrics from the latest run.
The next step is to find a run without the regression. You probably saw that already when looking at the graphs, so go back and remember the date and time just before the regression and then use the Older runs link.
There you will see date folders, scroll down to find your date and time and click on that folder. Then you will see a list of the data collected for that run: Screenshots, videos, HAR file (and if you use Chrome a list of trace logs that you can drag and drop into devtools). Choose the HAR file (browsertime.har.gz) and download it to your desktop.
The next step is to go to the tab again where you open the compare page. Choose one of the upload buttons and upload your newly downloaded HAR file.
Now you will see the both HARs (check the waterfall), screenshots, video and summaries of the two that hopefully can help you spot differences.
Tips and tricks
Check the screenshots (the easiest way is to go to https://grafana.wikimedia.org/dashboard/db/webpagereplay-drilldown). Look out for campaigns and try to correlate them to when they got activated. You can also find screenshots (and videos) http://webpagereplay-wikimedia.s3-website-us-east-1.amazonaws.com/ for WebPageReplay (you will find direct links on the dashboard) or for WebPageTest http://wpt.wmftest.org/testlog.php?days=1&filter=&all=on&nolimit=on
Check if there has been any release for the tool (using WebPageReplay make sure you click Show WebPageReplay changes and for WebPageTest Show WebPageTest changes. If the performance team updates the tool (new version of the tool, new version of the browser) there will be an annotation for that. It has happened that new browser versions have introduced a regression. WARNING: We still autoupdate WebPageTest, so it can happen that we miss an annotation for a browser upgrade or change in the tool.
Check if there is a release that correlates to the change by choosing Show sync-wikiversions and check the server admin log.
Do you see any changes in the Navigation Timing metrics? It's always good to try verify the change in both our ways of collecting metrics.
If the tests are run by Chrome we collect the internal trace log (both on WebPageTest and Browsertime/WebPageReplay) that you can use to dig deeper into what happens. For WebPageTest, you find the log (to download) using the Trace link. For Browsertime/WebPageReplay, the log for each run is in the result directory. Download the files, unpack them and drag and drop them into Developer Tools/Performance in Chrome.
Real user measurement
The real user measurements are metrics that we collect from real users, using browsers APIs. Historically these metrics have been more technical than those collected by synthetic testing, as we can't get visual measures from the user's browser.
Alerts that derive from Real User Measurement data will typically reference Navigation Timing in the alert. For example:
Notification Type: PROBLEM Service: https://grafana.wikimedia.org/dashboard/db/navigation-timing-alerts grafana alert Host: einsteinium Address: 126.96.36.199 State: CRITICAL Date/Time: Fri Aug 31 05:02:38 UTC 2018 Notes URLs: Additional Info: CRITICAL: https://grafana.wikimedia.org/dashboard/db/navigation-timing-alerts is alerting: Load event overall median.
The real user measurement metrics collect data from all browsers that support the Navigation Timing API . It also collects additional metrics like first paint (when something first is displayed on the screen), or the effective connection type, when the browser supports those additional APIs. We sample the data and use 1 out of 1000 requests by default. This can be overridden for specific geographies, pages, etc. where the sampling rate might be different.
Where to start
Start with the alert dashboard to verify the alert. Then head over to https://grafana.wikimedia.org/dashboard/db/navigation-timing and check the metric that caused the alert (first paint, responseStart, loadEventEnd) and try to identify how big the issue is (is it causing other metrics to increase? check different percentiles and different metrics to try to understand what has changed).
At this level the metrics are collected for both mobile and desktop and grouped together. Go to https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-platform to see the metrics grouped per type. There can be a big difference between both.
Then check Show sync-wikiversions along with the server admin log to see if any change has been made at the time of the regression.
Tips and tricks
If you cannot find what caused the regression you can try the Navigation Timing by browser dashboard. Check the report rate, has it changed? It could be that we did a release and accidentally changed how we collect the metrics or a new browser version rolled out that effect the metrics. You can see how many metric we collect for specific browser versions.
Do you see any change in the synthetic metrics? Use both tools to try to nail down the regression. The other tools can easier show you what has changed (by checking HAR from before and after the change).
It's possible that further drilling down is required and you may need to slice the data by other features than platform, browser or geography. For this, it's best to use hive and query the raw RUM data recorded under the NavigationTiming Eventlogging schema. Remember to narrow down your hive queries to the timespan around the regression, as the NavigationTiming table is huge (we record around 14 records per second on average).