You are browsing a read-only backup copy of Wikitech. The live site can be found at

Analytics/Data Lake

From Wikitech-static
< Analytics
Revision as of 14:49, 7 April 2017 by imported>Joal (Add general Data Lake information.)
Jump to navigation Jump to search

The Analytics Data Lake (ADL) is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, a data lake).

It contains:

  • Traffic data -- webrequest, pageviews, unique devices ...
  • Edits data -- Historical data about revisions, pages, and users [in beta as of 2017-04-07].

As the Data Lake matures, we will add any and all data, and try to safely make them public as much as possible.

For Technical aspects of the data lake pipelines, see Analytics/Systems/Data Lake.