You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Fundraising: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ejegg
(Draft of page)
 
imported>AndyRussG
(Expanded section on Banner Impressions, changed heading levels)
Line 1: Line 1:
The fundraising department uses the following statistics from the main wiki cluster to inform campaigns and update banners in realtime:
The fundraising department uses the following statistics from the main wiki cluster to inform campaigns and update banners in realtime:


==== Banner Impressions ====
== Banner Impressions ==
Requests to /beacon/impression, sampled at a rate of 1:10. We store timestamp rounded to the minute, plus querystring variables 'banner', 'campaign', 'project', 'country', 'language', 'result', 'reason'
Banners are displayed on wikis using the [[mw:Extension:CentralNotice|CentralNotice extension]], which targets pageviews based on country, project, language, device and logged-in status.


==== Landing page impressions ====
On any pageview that meets the criteria to be included in an active CentralNotice campaign, the system may send a request to <code>beacon/impression</code>. Such requests include many data points about whether or not a banner was actually displayed, and why. Data points are sent as URL parameters. For details, see [https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/48ad5b97f3e293d429a03caa59d62886941e1142/resources/subscribing/ext.centralNotice.display.state.js <code>ext.centralNotice.display.state.js</code>].
 
By default, 1% of pageviews in a campaign are randomly selected to send a <code>beacon/impression</code> request. However, for pageviews in fundraising campaigns, 100% of pageviews cause a request; this is necessary to provide accurate data for A/B testing of fundraising banners.
 
Calls to <code>beacon/impression</code> are processed on WMF servers in several ways, explained below.
 
=== Kafka/Kafkatee/pgehres database ===
Requests to <code>beacon/impression</code>, sampled at a rate of 1:10. We store timestamp rounded to the minute, plus querystring variables 'banner', 'campaign', 'project', 'country', 'language', 'result', 'reason'. Data is available nearly in real time.
 
=== Hive ===
Like other HTTP requests, calls to <code>beacon/impression</code> are logged in the <code>[[Analytics/Data Lake/Traffic/Webrequest|wmf.webrequest]]</code> table in Hive. All data points provided by CentalNotice may be selected or filtered in Hive QL via the <code>uri_query</code> column.
 
=== Druid/Pivot ===
Aggregated logs of <code>beacon/impression</code> requests are stored in [[Analytics/Systems/Druid|Druid]], in the [[Analytics/Data Lake/Traffic/Banner activity|Banner activity]] dataset. This allows fast querying on many common criteria. [https://pivot.wikimedia.org/#banner_activity_minutely Pivot] provides quick visualizations of this data.
 
== Landing page impressions ==
Hits to selected URLs on donate.wikimedia.org and wikimediafoundation.org. Querystring variables stored include utm_*, project, language, and country
Hits to selected URLs on donate.wikimedia.org and wikimediafoundation.org. Querystring variables stored include utm_*, project, language, and country


==== Unique email clicks ====
== Unique email clicks ==
For landing page impressions with a contact_id, we insert utm_source, utm_campaign, contact_id, and link_id into a table with a unique constraint on those columns, using 'ON DUPLICATE KEY' to discard clicks after the initial one for a given donor and email
For landing page impressions with a contact_id, we insert utm_source, utm_campaign, contact_id, and link_id into a table with a unique constraint on those columns, using 'ON DUPLICATE KEY' to discard clicks after the initial one for a given donor and email

Revision as of 18:27, 23 October 2017

The fundraising department uses the following statistics from the main wiki cluster to inform campaigns and update banners in realtime:

Banners are displayed on wikis using the CentralNotice extension, which targets pageviews based on country, project, language, device and logged-in status.

On any pageview that meets the criteria to be included in an active CentralNotice campaign, the system may send a request to beacon/impression. Such requests include many data points about whether or not a banner was actually displayed, and why. Data points are sent as URL parameters. For details, see ext.centralNotice.display.state.js.

By default, 1% of pageviews in a campaign are randomly selected to send a beacon/impression request. However, for pageviews in fundraising campaigns, 100% of pageviews cause a request; this is necessary to provide accurate data for A/B testing of fundraising banners.

Calls to beacon/impression are processed on WMF servers in several ways, explained below.

Kafka/Kafkatee/pgehres database

Requests to beacon/impression, sampled at a rate of 1:10. We store timestamp rounded to the minute, plus querystring variables 'banner', 'campaign', 'project', 'country', 'language', 'result', 'reason'. Data is available nearly in real time.

Hive

Like other HTTP requests, calls to beacon/impression are logged in the wmf.webrequest table in Hive. All data points provided by CentalNotice may be selected or filtered in Hive QL via the uri_query column.

Druid/Pivot

Aggregated logs of beacon/impression requests are stored in Druid, in the Banner activity dataset. This allows fast querying on many common criteria. Pivot provides quick visualizations of this data.

Landing page impressions

Hits to selected URLs on donate.wikimedia.org and wikimediafoundation.org. Querystring variables stored include utm_*, project, language, and country

Unique email clicks

For landing page impressions with a contact_id, we insert utm_source, utm_campaign, contact_id, and link_id into a table with a unique constraint on those columns, using 'ON DUPLICATE KEY' to discard clicks after the initial one for a given donor and email