You are browsing a read-only backup copy of Wikitech. The live site can be found at

Analytics/Data Lake

From Wikitech-static
< Analytics
Revision as of 09:39, 10 April 2017 by imported>Joal (small intro update)
Jump to navigation Jump to search

This page is the entry-point of the Analytics Data Lake (ADL) documentation. The ADL is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, a data lake). All of the data contained in the lake can be accessed through systems allowing to join them.

  • Traffic data -- webrequest, pageviews, unique devices ...
  • Edits data -- Historical data about revisions, pages, and users [in beta as of 2017-04-07].

As the Data Lake matures, we will add any and all data, and try to safely make them public as much as possible.

For Technical aspects of the data lake pipelines, see Analytics/Systems/Data Lake.