SLO/Test Kitchen
Status: approved
Organizational
The Experiment Platform team is building experimentation and instrumentation tools collectively known as Test Kitchen. This includes standardized and custom instruments and experiments. Test Kitchen enables controlled experimentation (A/B tests) across all Wikimedia properties. It provides standardized tools and processes to reduce experiment setup time from 10 weeks to 1 week, enable cross-wiki testing, support both logged-in and logged-out user testing, automate data collection and analysis, ensure compliance with privacy policies, and make experiment results eventually publicly accessible.
Service
This service is made up of parts running all over our infrastructure, which will be collectively known as Test Kitchen.
Teams
The Experiment Platform team is responsible for Test Kitchen.
Architectural
Environmental dependencies
Experimentation Lab consists of:
- Test Kitchen UI (fka MPIC or xLab): A standalone configuration UI with configuration API running on Node JS
-
Test Kitchen SDKs:
- PHP: Running on the MediaWiki application servers
- JS: Sent to visitor's browsers via ResourceLoader
- (future) Swift and Kotlin running in the native iOS and Android apps
- Varnish vmod logic that manages Edge Unique cookies
- EventGate customizations to handle data sent by Test Kitchen client libraries
- Data pipelines coordinating the flow of instrumentation data from Kafka to HDFS and ultimately Superset dashboards, using Airflow
- A Beta Cluster deployment that simulates the above as much as possible
Service dependencies
Hard Dependencies
- Varnish - without it users won't be enrolled in some experiments so we would consider the system to be down
- EventGate - receives requests to log instrumentation data, without it the experiments are costing us in user experience without gaining us data
- Kafka "jumbo" (see link to specific EventGate cluster we use and its dependency on Kafka "jumbo")
-
MediaWiki core and extensions (these are reacting to experiments and enabling MW developers to know what UI to display):
- ResourceLoader
- WikimediaEvents
-
MetricsPlatform
- EventLogging
- EventStreamConfig
Soft Dependencies
- Data Platform
-
Wikimedia Wikis
- Beta cluster
- Production
Client-facing
User documentation: Test Kitchen
Clients
- Experiment Managers: configure instrumentation in Test Kitchen UI
- Instrument Developers: program against the API provided by Test Kitchen SDKs
- Product Analysts: work with data collected and transformed in the Data Platform
- Varnish Experiment Configuration poller: regularly pulls configuration from Test Kitchen API to allow Varnish to act as an Experiment Enrollment Authority
Request Classes
- 1. Experiment Configuration: Creating / Reading / Updating / Deleting experiment configurations
-
2. Data Collection
-
- Sending instrumentation events through EventGate
- Processing user interaction data
-
Service Level Indicators (SLIs)
Request Class 1 : Experiment Configuration
Combined latency-availability SLI: The percentage of all application requests that complete within 1 second (1000 milliseconds) and receive a non-error response, defined as HTTP status code not 5XX. We would normally also consider HTTP 4XX responses as problematic, but our service is exposed to the public internet and we expect to get some unknown amount of 4XX from traffic we don't control. Nevertheless, if a client has 4XX problems, we commit to finding a way to monitoring that in the future and certainly address it if we find it.
Request Class 2 . Data Collection
Availability SLI. Let R be all requests to log experiment data via EventGate. Let this break down as:
R = S + Es + Ei + L
where:
- S: requests that successfully produce to the intended Kafka topic
- Es: requests that produce to the Error topic with a System-related error (invalid header, etc)
- Ei: requests that produce to the Error topic because of invalid instrumentation data (schema validation problems). Note: noise generated accidentally or purposefully would also fall in this category. We are working to minimize noise, but do not commit to that as part of SLOs defined around this SLI.
- L: requests that are lost on the network from the Test Kitchen SDKs to EventGate
We then have two SLIs: S / (S + Es) and S / (S + Ei)
(in the future we hope to track L as well, currently not feasible)
Operational
Monitoring
Request Class 1
-
Monitored by the experiment config fetchers:
-
MediaWiki:
mediawiki_wanobjectcache_regen_seconds_bucket{keygroup="MetricsPlatform"}Prometheus metric -
Varnish:
wmfuniq_experiment_fetcher_http_duration_seconds_lastPrometheus metric
-
MediaWiki:
- Monitored by Test Kitchen UI: The "Latency: Total avg" panel on the MPIC Service Grafana dashboard
Request Class 2
-
Svia the "Event produce rate by stream panel" on the EventGate Grafana dashboard -
Esvia the Prometheus metric created in https://phabricator.wikimedia.org/T398922 -
Eivia the "Event schema validation error rate by stream" panel on the EventGate Grafana dashboard -
As mentioned above, we do not monitor
Lcurrently
Request Class 3
N/A
Troubleshooting
See Test Kitchen/Troubleshooting .
Deployment
The EventGate and Test Kitchen UI services are deployed via Kubernetes. Details of the Test Kitchen UI deployment and deployment instructions can be found at Test Kitchen/Test Kitchen UI/Administration .
The MetricsPlatform MediaWiki extension is deployed via the MediaWiki deployment pipeline, which is maintained by Release Engineering.
Service Level Objectives
Request Class 1 : Experiment Configuration
Over a 90 day rolling window, 95% of application requests to Test Kitchen have HTTP status code not 5XX and latency of 1 second or less
Request Class 2 . Data Collection
Over a 90 day rolling window,
S / (S + Es) >= 99.9%
S / (S + Ei) >= 95%
Rules and tracking
For original implementation see data_platform.pp and recording_rules.yaml (implementation may be adjusted over time).
See https://slo.wikimedia.org/?search=xlab for SLO tracking.
Note that in configuration the rules for burn rate alerting (to be toggled on mid-November 2025) are expressed using a 4-week window, in keeping with current convention.