You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Analytics/Systems/Wikistats"

From Wikitech-static
Jump to navigation Jump to search
imported>Nuria
imported>Joal
(→‎New Version, Wikistats 2: Update for wikistats 2 with links)
Line 1: Line 1:
== New Version, Wikistats 2.0 ==
== Legacy Version, Wikistats 1 ==


The Wikistats interface, available at [https://stats.wikimedia.org/ stats.wikimedia.org] is being re-worked to be a little more browse-able and friendly.  You can see the prototype here: [https://analytics-prototype.wmflabs.org analytics-prototype.wmflabs.org].  The code for this work is in [https://phabricator.wikimedia.org/source/wikistats/ this repository] and we're aiming to deploy a very minimal version by the end of July, 2017.  We will iterate on that adding more metrics and functionality.
Wikistats 1, available at [https://stats.wikimedia.org/ stats.wikimedia.org], consists of two almost independent clusters of data and scripts.


== Data ==
*Data about edits, editors and content (harvested from xml dumps)
*Data about pageviews, per wiki, per month, globally or per country (nowadays harvested with hadoop/hive) (see also [[Analytics/Systems/Wikistats/Traffic|this page]])


The data behind Wikistats is [[Analytics/Data_Lake/Edits]].  It is processed in two steps:
== New Version, Wikistats 2 ==
# Extracted as raw data from labs replicas on a monthly basis, by a cron.
# Processed by a oozie job running scala and labeled with the month's snapshot (eg. 2017-06).


<font color=green>'''Dec 2017: WMF Analytics Team is happy to announce the first release of [https://stats.wikimedia.org/v2/#/all-projects Wikistats 2]. Wikistats has been redesigned for architectural simplicity, faster data processing, and a more dynamic and interactive user experience. The data used in the reports will also be made available for external processing.''' 


More details: [[Analytics/Data_Lake/Edits]]
First goal is to match the numbers of the current system, and to provide the most important reports, as decided by the Wikistats community (see [https://www.mediawiki.org/wiki/Analytics/Wikistats/DumpReports/Future_per_report survey]). Over time, we will continue to migrate reports and add new ones that you find useful. We can also analyze the data in new and interesting ways, and look forward to hearing your [https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Wikistats feedback and suggestions].</font> 


== Old Version (to archive soon) Monthly Pageview Reports ==
=== Data for Wikistats 2 ===


These are now feed with hive data
The data behind Wikistats 2 (at least the data on edits, editors and content) is based on [[Analytics/Data_Lake/Edits]].  It is processed in multiple steps:
[[File:Monthly_Pageview_Reports.png]]
 
# Extracted as raw data from labs replicas on a monthly basis, by a cron.
== Traffic Breakdown Reports ==
# Processed by a oozie job running scala and labeled with the month's snapshot (eg. 2017-06)
[[File:Traffic_Breakdown_Reports_I.png]]
# The data is then prepared and loaded into [[Analytics/Systems/Druid|Druid]], to allow for fast slicing and dicing
# Data is accessible over the internet through [[Analytics/AQS/Wikistats 2|AQS]], which queries Druid.

Revision as of 21:10, 13 December 2017

Legacy Version, Wikistats 1

Wikistats 1, available at stats.wikimedia.org, consists of two almost independent clusters of data and scripts.

  • Data about edits, editors and content (harvested from xml dumps)
  • Data about pageviews, per wiki, per month, globally or per country (nowadays harvested with hadoop/hive) (see also this page)

New Version, Wikistats 2

Dec 2017: WMF Analytics Team is happy to announce the first release of Wikistats 2. Wikistats has been redesigned for architectural simplicity, faster data processing, and a more dynamic and interactive user experience. The data used in the reports will also be made available for external processing.

First goal is to match the numbers of the current system, and to provide the most important reports, as decided by the Wikistats community (see survey). Over time, we will continue to migrate reports and add new ones that you find useful. We can also analyze the data in new and interesting ways, and look forward to hearing your feedback and suggestions.

Data for Wikistats 2

The data behind Wikistats 2 (at least the data on edits, editors and content) is based on Analytics/Data_Lake/Edits. It is processed in multiple steps:

  1. Extracted as raw data from labs replicas on a monthly basis, by a cron.
  2. Processed by a oozie job running scala and labeled with the month's snapshot (eg. 2017-06)
  3. The data is then prepared and loaded into Druid, to allow for fast slicing and dicing
  4. Data is accessible over the internet through AQS, which queries Druid.