You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Analytics/Data Lake/Edits
This page links to detailed information about Edits datasets in the Data Lake.
To access this data, log into stat1002.eqiad.wmnet
and run hive
. Here you can use wmf;
and query the tables described below.
In comparison to the traffic ones, those datasets are not continuously updated. They are regularly updated by fully re-importing/re-building them, creating a new snapshot
.
This snapshot
notion is key when querying the Edits datasets, since inclufing multiple snapshots doesn't sense for most queries. As of 2017-04, snapshots are provided monthly.
Datasets
Mediawiki raw data
Those are copy of mediawiki MySQL tables
- Archive
- ipblocks
- logging
- page
- revision
- user
- user_groups
Processed Data
- Mediawiki user history -- Dataset providing reconstructed history events of mediawiki users
- Mediawiki page history -- Dataset providing reconstructed history events of mediawiki pages
- Mediawiki history -- Fully denormalized dataset containing user, page and revision processed data
- Metrics -- Dataset providing precomputed metrics over edits data (e.g. monthly new registered users or daily edits by anonymous users)
Access
Some of the data above is made public through different systems (see Analytics main page), but any data on the Data Lake is private by default. For this, reference Analytics/Data access