You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Performance/NavigationTimingExtension/The future of the extension 2021
Early work in progress, I'm converting my document to a wikitech page ...
The https://www.mediawiki.org/wiki/Extension:NavigationTiming extension is Wikimedia's own real user measurement (RUM) solution to measure the performance for Wikipedia users. For a couple of years the Wikimedia Performance Team has had a task to look into if we should change replace the solution with something else. The problems we want to solve is the following:
- Do we collect relevant metrics?
- How can we minimise the time spent for the performance team adding a new metric?
- How can we enable developers to use the User Timing and Element Timing API to get more valuable metrics? The goal would be make it as easy as possible: When you as a developer adds a new User Timing, our RUM solution automatically picks it up and its drawn in one of the performance graphs.
The navigation timing extension has grown over time. It started out to collect metrics from the Navigation Timing API to get some insights of performance metrics from Wikipedia users.
In August 2021 the extension collected the following metrics:
- Navigation timing metrics - metrics from the Navigation Timing API that all modern browsers support. We also include the gaps between the metrics, since we a couple of years ago had a suspicion that all browsers wasn't reporting the metrics correctly. We also get the transfer size of the main document.
- Server Timing metrics - using the Server Timing API to get the host and cache type for the main document
- Feature policies violations - using the feature-policy-violation observer to get violations. I don't think we collect those in the backend though.
- Layout shift - Chromium browser Layout Shift API . The current version collects all layout shifts, not the cumulative or kind of cumulative Google recommends as one of their Google Web Vitals.
- Save timings - our own metric measuring a edit submission
- Central notice timings - a user timing, telling us that we at that moment show a central notice banner. Collecting that helps us know if central notice is effecting our metrics.
- Top image resource timing - get all Resource Timing API metrics from what we think (by doing some magic) is the article image.
- Our own CPU benchmark with battery level - to get a feeling of what happens with our users hardware over time. We also get the battery level if the user are on an Android phone, since lower battery slow down the phone.
- First Input delay - we collect all information about first input from the First Input API.
- Element timing custom metric - we collect all metrics from the Element Timing API.
- RUM Speed Index - we use the https://github.com/WPO-Foundation/RUM-SpeedIndex code to try to get the same Speed Index as we can do in synthetic tools.
- Paint timings - we collect first paint and first contentful paint though the Paint Timing API . For browser that supports old/non standardised ways of getting the first paint metrics, we use the old ways.
The extension also show the performance survey.
We collect many metrics but we also miss out of out on some of the latest ones in the web performance community: Largest contentful paint, cumulative layout shift and (CPU) long tasks.
There are three ways to move forward.
In the past we've been talking about replacing our own Navigation Timing Extension with an Open Source alternative. Using an Open Source alternative (e.g Boomerang) would potentially help us in a way that we do not need to implement every new metric ourself that we want to collect. There's also many many quirks in browsers that we have run into through the years, so using something that others also use could potentially help us avoid those.
Adopting something another tool needs a lot of work though: We need adopt the tool so we can receive the metrics in the backend, we need to review the tool (and every upgrade), we need to configure the tool so it collects what we need and we potentially needs to add missing metrics that we collect today.
Crafting a new more generic tool
The other alternative is to create new extension where we try to be more generic in how we collect metrics and make an generic version that other web properties can use.
Cleanup the current version
There's also a third alternative: cleanup the current version of the navigation timing extension (remove collecting metrics that we do not use) and add the metrics that we are missing. That makes the extension up to date and then we can push the decision about moving to another tool or create a new one to the future.
Cleanup the current version first makes most sense. By removing the metrics we don't use and and add the metrics we are missing, the extension is up to date with what we need for now. As the next step we can fine-tune and make it more generic so developers can add there own metrics. This can be done as one quarterly goal. We can then post pone evaluating Boomerang/creating a more generic tool to the future and when we think its important for the team, we can focus on it.
The idea is to collect only what we actually need to graph the metric. Avoid collect extra "nice" to have metrics that we do today.
Remove unused metrics
We should remove the metrics that we don't use. That will decrease the code in the extension and the amount of data we collect.
- Remove RUM Speed Index
- Remove Top image resource timing
- Remove battery level (when we published some kind of study)
Tune some of the metrics we collect to only collect bare minimum
- Change how we collect Layout shift to follow how Google collects the Google Web Vitals
- Look into if we can move the Central notice timings to collect User Timings metrics (do a "stop" list of names )
- Rewrite Element timings to collect only what we need (name, time and use stop list
Add missing metrics
- Collect Largest Contentful Paint
- Collect Long Tasks (total, total length and number before first paint)
New schema: perfbeacon
Add a new schema "perfbeacon" where we collect all metrics that happens after LoadEventEnd.
- What should we do with the performance study?
- Should we really "Feature policies violations"
- Should remove the gaps between Navigation Timing Metrics: yes