Test Kitchen/Stream configuration
This pages describes the process for configuring an event stream for Test Kitchen events.
Quickstart
This section describes how to configure an event stream with minimal custom options.
1. Submit a configuration patch
To configure a stream, edit two files in the mediawiki-config repository ( example patch ):
- ext-EventLogging.php
-
Register the stream by adding it to
wgEventLoggingStreamNames - ext-EventStreamConfig.php
-
Configure the stream's schema, contextual attributes, and other options in
wgEventStreams
For example, here are the changes to configure a stream using:
-
mediawiki.interwiki_link_hoveras the stream name - the Test Kitchen base schema for web
-
two
contextual attributes
:
page_idandpage_title
'wgEventLoggingStreamNames' => [
'default' => [
// ...
'mediawiki.interwiki_link_hover',
],
],
<?php
// …
'wgEventStreams' => [
'default' => [
// …
'mediawiki.interwiki_link_hover' => [
'schema_title' => 'analytics/product_metrics/web/base',
'destination_event_service' => 'eventgate-analytics-external',
'producers' => [
'eventgate' => [
'enrich_fields_from_http_headers' => [
// Don't collect the user agent
'http.request_headers.user-agent' => false,
],
// 'use_edge_uniques' => true, // optional, only needed for everyone experiments
],
'metrics_platform_client' => [
// Contextual attributes to add to the event before it is submitted to this stream.
'provide_values' => [
"page_id",
"page_title"
],
],
],
],
],
],
// …
In a local development environment, these variables are normally declared in
LocalSettings.php
.
2. Deploy the patch
Once your patch has been reviewed, you'll need to schedule a Backport window deployment to sync out your config change to the production cluster.
Choosing a stream name
Name your stream following the pattern:
mediawiki.product_metrics.<product>_<component>_<interaction>
. For example:
mediawiki.product_metrics.homepage_module_interaction
. Note that the stream name will become the name of the Hive table where the events will be stored, with dots and dashes replaced with underscores.
Stream configuration
Metrics/Test Kitchen streams are declared in
wgEventStreams
in
ext-EventStreamConfig.php
. They include:
-
a
schema_titlecontaining the name of the schema that will be used to validate events. For example, to use the Test Kitchen base schema for web , use/analytics/product_metrics/web/base. -
a
producersproperty containing:-
a
metrics_platform_clientelement. (Note that additional producers may be included, so long as their output conforms to the Test Kitchen schema.) -
an optional
eventgateelement for opting out of header-based User-Agent collection and for enabling data collection from experiments that use edge uniques for enrollment .
-
a
-
a
sampleproperty configuring analytics sampling for the stream.
The
metrics_platform_client
element may include the following optional properties:
-
provide_values: the contextual attributes that should be added into the event before it is submitted to this stream -
curation: a list of curation rules
General documentation for stream configuration is at Event Platform/Stream Configuration and Wikimedia Product/Analytics Infrastructure/Stream configuration .
Example: Using custom sampling rates
This example shows the declaration of a default Test Kitchen stream
my.stream,
as it might appear in
ext-EventStreamConfig.php
. It illustrates the
schema_title
and
metrics_platform_client
declaration elements discussed above.
In addition, this example shows how you can set the default analytics sampling
rate
to
0
, and default analytics sampling
unit
to
pageview
, and then give
foowiki
its own distinct sampling rate of
0.2
. Sampling configuration for Test Kitchen streams is no different than for other Event Platform streams. Test Kitchen code takes care of sampling in accordance with the relevant stream configurations. Additional details about sampling units are available at
Test Kitchen/Analytics sampling
. To learn more about sampling configuration, see
Wikimedia Product/Analytics Infrastructure/Stream configuration
.
Additional information about configuration formats is available in Configuration files .
<?php
// …
'wgEventStreams' => [
// Define the stream for all wikis in production and on the Beta Cluster
// including the Meta-Wiki, which means that you can observe events flowing
// on it using the eventstreams-ui tool.
'default' => [
'my.stream' => [
// The Test Kitchen web base schema.
'schema_title' => 'analytics/product_metrics/web/base',
'destination_event_service' => 'eventgate-analytics-external',
'producers' => [
'metrics_platform_client' => [
// The contextual values that should be mixed into the event
// before it is submitted to this stream.
'provide_values' => [
'mediawiki_database',
'mediawiki_is_production',
],
],
],
// Do not submitted events to this stream by default. Sampling
// rates are set below, as needed, for each wiki.
'sample' => [
'unit' => 'pageview',
'rate' => 0,
],
],
],
// …
// Use a sampling rate of 0.2 for my.stream on foowiki. (Instead of a wiki,
// this could also be a dblist, e.g. group0, group1, etc.)
'+foowiki' => [
'my.stream' => [
'sample' => [
'rate' => 0.2,
],
],
],
],
// …
Example: Using a sampling rate of 1 on the beta cluster
This example shows how you could set a sampling rate of
1
for
foowiki
on the Beta Cluster, using a
+foowiki
element in
InitialiseSettings-labs.php
(which would override the
+foowiki
element declared above):
<?php
// …
'wgEventStreams' => [
// …
// As above, submit all events to this stream on foowiki on the Beta
// Cluster.
'+foowiki' => [
'my.stream' => [
'sample' => [
'rate' => 1,
],
],
],
// …
],
// …
Curation rules
Test Kitchen supports the specification of
curation rules
, which provide conditional filtering of events. Curation rules are specified using the (optional)
curation
property of the
metrics_platform_client
producer, for a particular stream. Each curation rule specifies a simple condition that must be met by an event for that event to be submitted to the stream. An event will only be submitted to a stream if
all
curation rules evaluate to true for that event.
Each curation rule is associated with a contextual attribute, and has 2 parts: an
operator
and an
operand
. When the value of the contextual attribute (for a particular event) is combined with the operator and operand of a curation rule, it forms a simple Boolean expression to be evaluated by Test Kitchen code. For example, the
curation
element shown below associates one rule with the contextual attribute
page_namespace_name
. The operator of the rule is
equals
, and its operand is
'Talk'
. When this rule is evaluated for a particular event, Test Kitchen code first obtains the value of the contextual attribute. If that value is in fact
'Talk'
, the rule evaluates to true; otherwise it evaluates to false.
As another example, the
curation
element below also associates two rules with the contextual attribute
page_id
. The first rule employs operator
less_than
and operand
500
. The 2nd rule employs operator
not_equals
, and operand
42
. Considering these two rules, an event will only be submitted if its
page_id
is less than 500, but also not 42.
Example: Using curation rules
For this example, we have copied Example 1, omitted some comments and details, and added a
curation
element.
<?php
// …
'wgEventStreams' => [
'default' => [
'my.stream' => [
'schema_title' => 'analytics/product_metrics/web/base',
'destination_event_service' => 'eventgate-analytics-external',
'producers' => [
'metrics_platform_client' => [
'provide_values' => [ … ],
'curation' => [
'page_namespace_name' => [
'equals' => 'Talk'
],
'performer_is_logged_in' => [
'equals' => true
],
'page_id' => [
'less_than' => 500,
'not_equals' => 42
],
'performer_edit_count_bucket' => [
'in' => [ '100-999 edits', '1000+ edits' ]
],
'performer_groups' => [
'contains_all' => [ 'user', 'autoconfirmed' ],
'does_not_contain' => 'sysop'
],
],
],
],
],
],
// …
],
// …
The operator of a rule can be any one of these: equals, not_equals, less_than, greater_than, greater_than_or_equals, less_than_or_equals, in, not_in, contains, does_not_contain, contains_all, contains_any.
Operands can be primitive values (strings, numbers, Boolean values, or null), or, in some cases, an array of primitive values. For each rule, the appropriate operand type(s) depends primarily on the operator, but sometimes also on the contextual attribute the rule is associated with. For example, if operator
equals
is used with contextual attribute
page_id
it only makes sense for the operand to be a number, but if
equals
is used with
page_namespace_name
it only makes sense for the operand to be a string. Arrays of primitive values are appropriate for use with
in
,
not_in
,
contains_all
, and
contains_any
.
See metrics_platform_client.schema.json#61 for the formal declaration of the available operators, and the (syntactically) allowed operand types for each operator. (Note that the word operator is not used in the schema file; property is used instead.)
Stream registration
Next, list your stream in wgEventLoggingStreamNames in ext-EventLogging.php so that Test Kitchen (by way of the EventLogging extension) will get the config for your stream and be able to produce these events.
'wgEventLoggingStreamNames' => [
'default' => [
// ...
'mediawiki.interwiki_link_hover',
],
],
If you've made these changes in
InitialiseSettings-labs.php
, you can find a reviewer to just merge your change and the config will be automatically synced to the beta cluster. If your instrumentation code changes are also merged, you'll then be sending these events in the beta environment.
If you've made these changes in
ext-EventLogging.php
, you'll need to schedule a
Backport window deployment
to sync out your config change to the production cluster. See
Deployments
and
Backport windows
for instructions.
Overriding event stream config settings
To override config settings for event streams, see Event Platform/Instrumentation How To#Overriding event stream config settings .
Configuring a stream in the beta cluster
If you want to observe a Beta cluster stream using the Beta cluster eventstreams-ui tool , you must ensure that the stream is defined for the Beta Cluster Meta-Wiki, as determined by the rules explained in Configuration files . That tool fetches stream configurations from that Meta-Wiki.
If you want to create a stream on the Beta Cluster which is not configured on Production (or is configured differently on Production), you need to declare that stream in
InitialiseSettings-labs.php
. In some cases, you may need to prepend your
InitialiseSettings-labs.php
configuration key with '-', as explained in
Configuration files
.
Decommissioning
As noted in Event Platform/Instrumentation How To#Decommissioning , stream-related code and configuration can be removed at anytime to stop producing an event stream.