Test Kitchen/Automated analysis of experiments/Prepared metrics
In some cases you may not be able to write a query template that produces the necessary columns (
variation
,
subject_id
, and
outcome
), but you are able to produce the
quantities
required for analysis: a
sample_size
,
sample_mean
, and
sample_variance
for each
variation
(group in experiment: "control" and "treatment").
The
prepared_mean
and
prepared_proportion
metric types tell the analysis system to not perform the calculations it does for the
mean
and
proportion
metric types, because those calculations are handled by the
query_template
.
Example: Fundraising
Scenario : Suppose we have instrumented clicks on the Donate link/button and we want to know if our treatment causes more donations to be made. Not just clicks on the link/button (which indicate intent to donate), but actual donations (end of funnel). In that case, we could instrument donations in such a way where:
-
For privacy reasons, we can only pass experiment name and group membership to where the user starts the donation process.
- This is no different than passing a campaign parameter when running A/B tests of fundraising banners.
-
When a donation has successfully been made through the payment processor and the donor is taken to the Thank You page, produce an event to the external analytics endpoint.
- Preferably server-side, but there's an argument to be made for client-side for consistency with click instrumentation.
The event (adhering to version 1.4.2 of base web schema ) could have data like:
{
"$schema": "/analytics/product_metrics/web/base/1.4.2",
"meta": {
"stream": "product_metrics_web_base"
},
"action": "donate",
"action_subtype": "success",
"instrument_name": "Donation processor",
"experiment": {
"coordinator": "xLab",
"enrolled": "{experiment_machine_name}",
"assigned": "{group_machine_name}
}
}
Notice that this event does not have
experiment.subject_id
.
Example prepared_proportion query
Suppose that with our experiment we are interested in impact of treatment on the proportion of users who click on the donate link/button (demonstrate intent ) and then go on to actually donate (demonstrate committment ).
In this case, the samples are actualy subsets of the overall samples, because only a small proportion of the overall subjects in the experiment click on the donate link/button.
We can also assume that any given subject in the experiment is highly unlikely to donate more than once. They might click on the link/button more than once during the experiment, but they will not donate more than once.
Because
{table}
is specified at experiment level, we need to make sure that
click
events from client-side instrumentation (on the donate link/button) are flowing into the same stream as
donate
events from server-side instrumentation (on the donation process).
WITH
clicked_counts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
COUNT(DISTINCT subject_id) AS clicked_count
FROM {table}
WHERE action = 'click'
AND instrument_name = 'ClickThroughRateInstrument'
AND element_friendly_name = 'Donate link'
AND {where_boilerplate}
GROUP BY 1
),
donated_counts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
COUNT(1) AS donated_count
FROM {table}
WHERE action = 'donate'
AND action_subtype = 'success'
AND instrument_name = 'Donation processor'
AND {where_boilerplate}
GROUP BY 1
),
stats AS (
SELECT
cc.variation,
cc.clicked_count AS n,
1.0 * dc.donated_count / cc.clicked_count AS p
FROM clicked_counts cc
LEFT JOIN donated_counts dc ON cc.variation = dc.variation
)
SELECT
variation,
n AS sample_size,
p AS sample_mean,
p * (1-p) AS sample_variance
FROM stats
Example prepared_mean query
Suppose we had donation events like
{
"$schema": "/analytics/product_metrics/web/base/1.4.1",
"meta": {
"stream": "product_metrics_web_base"
},
"action": "donate",
"action_subtype": "amount",
"action_context": "123.45",
"instrument_name": "Donation processor",
"experiment": {
"coordinator": "xLab",
"enrolled": "{experiment_machine_name}",
"assigned": "{group_machine_name}
}
}
and included the donation amount (in USD) in
action_context
, and our experiment was to see if the treatment we are testing causes donors to donate more money on average. Again, for privacy reasons we would not have
subject_id
available in this data, so we would have to
prepare
the statistical quantities inside the query. Again, we also have to assume that each event is coming from a unique subject in the sample – which is fair to assume, since donors are unlikely to donate twice in a short period of time (e.g. duration of an experiment).
WITH donated_amounts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
CAST(action_context AS DOUBLE) AS donated_amount
FROM {table}
WHERE action = 'donate'
AND action_subtype = 'amount'
AND instrument_name = 'Donation processor'
AND {where_boilerplate}
)
SELECT
variation,
COUNT(1) AS sample_size,
AVG(donated_amount) AS sample_mean,
VAR_SAMP(donated_amount) AS sample_variance
FROM donated_amounts
GROUP BY 1