You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Performance/Synthetic testing
Synthetic performance testing is using a browser on a server or phone somewhere in the world, load a web page, and collect performance metrics. Together with real-user measurements (collect performance metrics from real users) this is one of two ways we collect web performance metrics for Wikipedia.
The reason we use synthetic performance testing is to be able to find performance regressions in our code. But we also use the data we collect as a wayback machine, where we can go back in time and see what kind of content we where serving.
To be able to get stable metrics to find regressions we need to have stable environment for our tests:
- Run the browser and tool on a stable server or on a dedicated mobile phone. At the moment we run our tools on AWS and on Android phones.
- The tests needs to have stable connectivity, we need to have the same connectivity all the time so that it is not affecting our metrics. We use tc to get that.
- We need to have a stable browser. We keep track of browser updates and browser behave the same all the time.
- The pages we test should stable performance. Depending on how the page is built it can be more or less stable.
Tools
We use two different ways of measure the performance of Wikipedia:
- We use Browsertime to measure the full performance journey from the browser to the server and back. We use it to collect user journeys (a user visiting multiple pages).
- We use WebPageReplay together with Browsertime to focus on the front end performance. We use that to measure one pages performance with an empty browser cache. WebPageReplay is a traffic replay proxy that we use to get rid of server and internet flakiness.
Browsertime is the engine of sitespeed.io that controls the browser and runs the test. Browsertime is also used by Mozilla to measure the performance of Firefox. WebPageReplay is used by Chrome to keep track of Chromes performance.
Both tools tests are configured in https://gerrit.wikimedia.org/g/performance/synthetic-monitoring-tests
We also use Browsertime/WebPageReplay to collect metrics from Android browser. Those tests are configured in https://gerrit.wikimedia.org/g/performance/mobile-synthetic-monitoring-tests
Browsertime/sitespeed.io/WebPageReplay
We've been using Browsertime/sitespeed.io/WebPageReplay since 2017 and you can read about our setup. We collect metrics and store them on our Graphite instance. You can see all pages/metrics on our page drill down dashboard.
You can choose different types of testing with the device dropdown:
- desktop - tests where we test desktop Wikipedia
- emulatedMobile - tests where we test mobile Wikipedia using a desktop browser, emulating a mobile phone
- android - tests mobile Wikipedia using real mobile phones
The next dropdown Test type chooses what test to see. It can be first view tests (with a cold browser cache), warm cache view, webpagereplay tests or different user journeys.
We also alert on those metrics using WebPageReplay for desktop/emulated mobile and Android and first view cold cache on desktop.
CruX
Google uses perf metrics collected within the Chrome browser from websites people visit (from users who "opt-in" by syncing their Google account with Chrome). These are used by Google Search to know how a website performs in reality, and this data is available publicly as part of their Chrome User Experience Report. In order to keep track on how Wikipedia is doing from Google's point of view, we collect this data once a day from the Google API and store it in Graphite. You can explore these metrics on our Chrome User Experience dashboard in Grafana.
The daily crawl runs on the gpsi.webperf.eqiad1.wikimedia.cloud
server, where run a couple of tests and collect if we are slow/moderate/fast. The data is collected using sitespeed.io CruX plugin.
When to use what tool/test?
If you test the mobile version of Wikipedia, you should run tests on Android and our emulated mobile tests. What's good about running Android tests is that you know for sure the performance on that specific Android device, and we can say things like the first visual change of the Barack Obama page on English Wikipedia regressed by ten percent on a Moto G 5 phone.
If you want to find small frontend regression, testing with WebPageReplay should be your thing. However at the moment we only test one page (first view cold cache) tests with WebPageReplay.
If you want to test user journeys, test them direct against Wikipedia servers using Browsertime. If you are not sure what tests to use, please reach out to the performance team and we will help you!