You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Analytics/Data Lake: Difference between revisions
Jump to navigation
Jump to search
imported>Neil P. Quinn-WMF (Update to clarify that the Data Lake is essentially everything in the Analytics Cluster.) |
imported>Joal (Add general Data Lake information.) |
||
Line 1: | Line 1: | ||
The Analytics Data Lake (ADL) is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, | The Analytics Data Lake (ADL) is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, a [[data lake]]). | ||
It contains: | |||
* [[Analytics/Data Lake/Traffic|Traffic data]] -- webrequest, pageviews, unique devices ... | |||
* [[Analytics/Data Lake/Edits|Edits data]] -- Historical data about revisions, pages, and users [in beta as of 2017-04-07]. | |||
As the Data Lake matures, we will add any and all data, and try to safely make them public as much as possible. | |||
For Technical aspects of the data lake pipelines, see [[Analytics/Systems/Data Lake]]. | |||
Revision as of 14:49, 7 April 2017
The Analytics Data Lake (ADL) is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, a data lake).
It contains:
- Traffic data -- webrequest, pageviews, unique devices ...
- Edits data -- Historical data about revisions, pages, and users [in beta as of 2017-04-07].
As the Data Lake matures, we will add any and all data, and try to safely make them public as much as possible.
For Technical aspects of the data lake pipelines, see Analytics/Systems/Data Lake.