You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
< Analytics
Revision as of 18:27, 23 October 2017 by imported>AndyRussG (Expanded section on Banner Impressions, changed heading levels)
Jump to navigation Jump to search

The fundraising department uses the following statistics from the main wiki cluster to inform campaigns and update banners in realtime:

Banners are displayed on wikis using the CentralNotice extension, which targets pageviews based on country, project, language, device and logged-in status.

On any pageview that meets the criteria to be included in an active CentralNotice campaign, the system may send a request to beacon/impression. Such requests include many data points about whether or not a banner was actually displayed, and why. Data points are sent as URL parameters. For details, see ext.centralNotice.display.state.js.

By default, 1% of pageviews in a campaign are randomly selected to send a beacon/impression request. However, for pageviews in fundraising campaigns, 100% of pageviews cause a request; this is necessary to provide accurate data for A/B testing of fundraising banners.

Calls to beacon/impression are processed on WMF servers in several ways, explained below.

Kafka/Kafkatee/pgehres database

Requests to beacon/impression, sampled at a rate of 1:10. We store timestamp rounded to the minute, plus querystring variables 'banner', 'campaign', 'project', 'country', 'language', 'result', 'reason'. Data is available nearly in real time.


Like other HTTP requests, calls to beacon/impression are logged in the wmf.webrequest table in Hive. All data points provided by CentalNotice may be selected or filtered in Hive QL via the uri_query column.


Aggregated logs of beacon/impression requests are stored in Druid, in the Banner activity dataset. This allows fast querying on many common criteria. Pivot provides quick visualizations of this data.

Landing page impressions

Hits to selected URLs on and Querystring variables stored include utm_*, project, language, and country

Unique email clicks

For landing page impressions with a contact_id, we insert utm_source, utm_campaign, contact_id, and link_id into a table with a unique constraint on those columns, using 'ON DUPLICATE KEY' to discard clicks after the initial one for a given donor and email