You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Event Platform/Analytics/Fragments"

From Wikitech-static
Jump to navigation Jump to search
imported>Nettrom
(→‎Schema Fragments: fields are required, not events)
imported>Bearloga
Line 6: Line 6:
 
!Fragment
 
!Fragment
 
!<code>$ref</code>
 
!<code>$ref</code>
 +
|-
 +
|Common
 +
|<code>/fragment/analytics/common/1.0.0#</code>
 
|-
 
|-
 
|[[Event Platform/Analytics/Fragments#Identifiers|Core identifiers]]
 
|[[Event Platform/Analytics/Fragments#Identifiers|Core identifiers]]
Line 29: Line 32:
 
|}
 
|}
 
In the schema, reference the fragment(s) you wish to use and list which fields are required in every event.
 
In the schema, reference the fragment(s) you wish to use and list which fields are required in every event.
 +
 +
'''Note''': You must, at the very least, reference the /fragment/analytics/common schema fragment in your schema if you are not referencing any of the other fragments. This fragment provides the <code>client_dt</code> field which. The other fragments (such as the core identifiers fragment) reference the common fragment, so it is not necessary to reference both.
  
 
=== Example 1 ===
 
=== Example 1 ===
Line 60: Line 65:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
All new schemas for product analytics would include the core identifiers fragment:
+
Schemas and instruments for tracking product usage where it's important to link a set of events made by a user on the same page (on the web), in the same session, or with the same install of a mobile app (Wikipedia for Android, iOS, KaiOS) should include the core identifiers fragment. There are other identifiers in other fragments that may be useful for the specific analytics requirement, but in most cases one or more of the following fields will be absolutely essential:
 
;Core identifiers
 
;Core identifiers
 
;<code>device_id</code> (string)
 
;<code>device_id</code> (string)
:Identifies a client across multiple sessions. This is the "app install ID" on mobile apps and enables calculation of retention metrics for anonymous users since we do not have a user ID for those. MediaWiki-based instrumentation does not include this identifier in the events it sends.
+
:Identifies a client across multiple sessions, applicable only to apps. This is previous referred to as the "app install ID" on mobile apps in legacy EventLogging schemas and enables calculation of retention metrics for anonymous users since we do not have a user ID for those. MediaWiki-based instrumentation does not include this identifier in the events it sends.
 
;<code>session_id</code> (string)
 
;<code>session_id</code> (string)
:Identifies a session. On MediaWiki, a session last for the lifetime of the browser process (refer to [[phab:T223931|T223931]] for additional information) and can be retrieved with <code>mw.user.sessionId()</code>. On iOS and Android apps, where the app is allowed to enter a background state, sessions expire after 15 minutes of inactivity. If the app returns to the foreground after 15 minutes, a new session ID is generated.
+
:Identifies a session - a set of actions taken by the user within a period of time. It starts the first time the user visits the website or launches the app, and ends after a significant period of inactivity or explicitly exiting the browser/app. On MediaWiki, a session last for the lifetime of the browser process (refer to [[phab:T223931|T223931]] for additional information) and can be retrieved with <code>mw.user.sessionId()</code>. On iOS and Android apps, where the app is allowed to enter a background state, sessions expire after 15 minutes of inactivity. If the app returns to the foreground after 15 minutes, a new session ID is generated.
 
;<code>pageview_id</code> (string)
 
;<code>pageview_id</code> (string)
 
:Identifies a page view, applicable only on the web. Interactions with multiple features (instrumented separately) on the same page may be linked together via this identifier. On MediaWiki this is retrievable with <code>mw.user.getPageviewToken()</code>.
 
:Identifies a page view, applicable only on the web. Interactions with multiple features (instrumented separately) on the same page may be linked together via this identifier. On MediaWiki this is retrievable with <code>mw.user.getPageviewToken()</code>.
 +
 +
Refer to [[Event_Platform/Instrumentation_How_To#Identifiers_in_schemas|identifiers in schemas]] and [[Event_Platform/Instrumentation_How_To#Identifiers_in_instruments|identifiers in instruments]] for how to use these identifiers in analytics schemas and MediaWiki-based instruments.
  
 
==== Sequences ====
 
==== Sequences ====

Revision as of 14:30, 13 August 2020

Schema Fragments

Index of schema fragments for referencing in schemas
Fragment $ref
Common /fragment/analytics/common/1.0.0#
Core identifiers /fragment/analytics/identifiers/1.0.0#
Activity sequencing /fragment/analytics/activity_seq/1.0.0#
User /fragment/analytics/user/1.0.0#
Page /fragment/analytics/page/1.0.0#
User Interface (UI) /fragment/analytics/ui/1.0.0#
A/B Testing /fragment/analytics/ab_testing/1.0.0#
Campaign attribution (UTM parameters) /fragment/analytics/utm_parameters/1.0.0#

In the schema, reference the fragment(s) you wish to use and list which fields are required in every event.

Note: You must, at the very least, reference the /fragment/analytics/common schema fragment in your schema if you are not referencing any of the other fragments. This fragment provides the client_dt field which. The other fragments (such as the core identifiers fragment) reference the common fragment, so it is not necessary to reference both.

Example 1

Suppose we're running an A/B test on a new default skin for anonymous users and we are interested in measuring session length and average number of visited articles per session.

The schema would use the following fragments: core identifiers, page, UI, and A/B testing via:

allOf:
    - $ref: /fragment/analytics/identifiers/1.0.0#
    - $ref: /fragment/analytics/page/1.0.0#
    - $ref: /fragment/analytics/ui/1.0.0#
    - $ref: /fragment/analytics/ab_testing/1.0.0#

And the following fields would need to be included in the one (1) event logged by the instrument on every page load:

required:
    - session_id
    - pageview_id
    - page_ns
    - ui_screen
    - test_name
    - test_group

The remainder of this section describes these fields and others in those fragments.

Identifiers

Use the following to include the core identifiers fragment in your schema:

allOf:
    - $ref: /fragment/analytics/identifiers/1.0.0#

Schemas and instruments for tracking product usage where it's important to link a set of events made by a user on the same page (on the web), in the same session, or with the same install of a mobile app (Wikipedia for Android, iOS, KaiOS) should include the core identifiers fragment. There are other identifiers in other fragments that may be useful for the specific analytics requirement, but in most cases one or more of the following fields will be absolutely essential:

Core identifiers
device_id (string)
Identifies a client across multiple sessions, applicable only to apps. This is previous referred to as the "app install ID" on mobile apps in legacy EventLogging schemas and enables calculation of retention metrics for anonymous users since we do not have a user ID for those. MediaWiki-based instrumentation does not include this identifier in the events it sends.
session_id (string)
Identifies a session - a set of actions taken by the user within a period of time. It starts the first time the user visits the website or launches the app, and ends after a significant period of inactivity or explicitly exiting the browser/app. On MediaWiki, a session last for the lifetime of the browser process (refer to T223931 for additional information) and can be retrieved with mw.user.sessionId(). On iOS and Android apps, where the app is allowed to enter a background state, sessions expire after 15 minutes of inactivity. If the app returns to the foreground after 15 minutes, a new session ID is generated.
pageview_id (string)
Identifies a page view, applicable only on the web. Interactions with multiple features (instrumented separately) on the same page may be linked together via this identifier. On MediaWiki this is retrievable with mw.user.getPageviewToken().

Refer to identifiers in schemas and identifiers in instruments for how to use these identifiers in analytics schemas and MediaWiki-based instruments.

Sequences

Use the following to include the activity sequencing fragment in your schema:

allOf:
    - $ref: /fragment/analytics/activity_seq/1.0.0#
Activity sequencing (for reconstructing sequences of events)
activity_id (string)
Identifies a sequence of actions in the same context or funnel. In the past, teams have used terms like "session ID" and "sub-session ID" to refer to a set of connected events, such as interacting with a widget. This identifier is useful for grouping together impressions with corresponding clicks, and for grouping together steps in a process such as making an edit. Activity identifier can be randomly generated or a counter.
sequence_id (integer)
Starting at 1, this is a counter for reconstructing the order of events in the same activity. For a variety of reasons we cannot trust the timestamp of receipt or the client-side timestamp of when the event was generated for putting events in order. In cases where the exact sequence of events needs to be established, this identifier can be used to record which event happened 1st, which happened 2nd, and so on.

For example, suppose the user is making an edit. We group the actions performed in this activity with activity_id. In the old way of doing things it would be feature-specific "editing_session_id". As the user interacts with various (instrumented) features/elements in the editor, previews the edit, continues editing, and finally publishes the edit, specific data about all of those interactions can be tracked in schema-specific fields, but the order in which those interactions happen is recorded in sequence_id.

Data

User

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/user/1.0.0#

Information about the user associated with the event

Information about the user generating the event
is_anon (boolean)
Whether user is logged-in (false) or anonymous (true)
user_id (integer)
User's MW user ID; 0 if user is anonymous. User ID is specific the wiki that the event came from.
user_name (string)
Cross-wiki username
user_edit_count (integer)
The total number of edits by the user at the time of the event. Growth team retrieves this with mw.config.get( 'wgUserEditCount' ) to record it for their experiments. May be useful as a proxy for experience at the time of the event.

Page

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/page/1.0.0#

Information about the page associated with the event

Information about the page the event generated on
wiki_db (string)
Database name of the wiki (e.g. "enwiki", "commonswiki")
page_id (integer)
Page's numeric ID in MediaWiki
page_ns (integer)
Page's namespace code in MediaWiki (e.g. 0 for Main/Article, -1 for Special)
page_title (string)
Title of the page
page_is_redirect (boolean)
Whether the page is a redirect or not at the time of the event

User Interface

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/ui/1.0.0#

Information about the UI associated with the event

Information about the interface the user saw when the event was generated
ui_mw_skin (string)
MediaWiki skin name (e.g. "Vector", "MinervaNeue", "Modern") at the time of the event; only applicable on MediaWiki, not on mobile apps
ui_color_mode (string, enum)
Mode at the time of the event, currently only applicable on mobile apps, but Web is experimenting with it for MediaWiki.[1] One of: "light", "sepia", "dark", "black", "night"
ui_text_scale (integer)
Only applicable for mobile apps where the user chooses from predefined text scales. 0 is for the middle (application default), -1 is for the smaller size while 1 is for the larger size. The actual size in points or pixels will vary by app and device, so we record a relative scale.
ui_screen (object)
Information about the screen, such as dimensions, detailed below:
ui_screen.width_px (integer)
Width of the screen in pixels
ui_screen.height_px (integer)
Height of the screen in pixels

A/B Testing

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/ab_testing/1.0.0#

Information about the A/B test (experiment) associated with the event

Information about the experiment the user was enrolled in when the event was generated
tests (array)
Any and all A/B tests the user is enrolled in at the time of the event. If the array is empty, the user was not in any A/B tests. If there is only one item, the user was in exactly one A/B test. If there are two or more items, the user was in several A/B tests.

Each item in the tests array is an object identifying enrollment in a single A/B test with the following fields:

name (string)
Name of the A/B test the user is enrolled in (e.g. "Desktop Redesign (Phase 3)" or "desktop-redesign-3"
group (string)
Name of the group (sometimes called "bucket") the user was randomly assigned to – e.g. "control", "variant-a", "variant-b", "variant-c"

Examples:

"tests": []

"tests": [ { "name": "growth-homepage", "group": "control" } ]

"tests": [ { "name": "growth-homepage", "group": "variant-1" }, { "name": "growth-help-panel", "group": "variant-2" } ]

Campaign Attribution

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/utm_parameters/1.0.0#

Information about the UTM parameters associated with the event

Information about where the user came from
utm.source
Identifies which site sent the traffic, and is a required parameter. For example: "Wikipedia", "Twitter", "Facebook"
utm.medium
Identifies what type of link was used such as "socialmedia" or "email"
utm.campaign
Identifies a specific product promotion or strategic campaign. For example: "app_marketing_20200704" or "india_awareness_2017"
utm.term
Identifies search terms (e.g. "mobile+app")
utm.content
Identifies what specifically was clicked to bring the user to the site, such as a banner ad, a text link, or a sidebar button. It is often used for A/B testing and content-targeted ads.

References