You are browsing a read-only backup copy of Wikitech. The primary site can be found at

Analytics/Data Lake/Mediawiki user history

From Wikitech-static
< Analytics‎ | Data Lake
Revision as of 18:23, 2 December 2016 by imported>Milimetric (Milimetric moved page Analytics/Data Lake/Mediawiki User history to Analytics/Data Lake/Mediawiki user history)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page describes the data set that stores the user history of WMF's wikis. It lives in Analytic's Hadoop cluster and is accessible via a Hive/Beeline external table. For more detail of the purpose of this data set, please read Analytics/Data Lake/Page and user history reconstruction. Also visit Analytics/Data access if you don't know how to access this data set.


`start_timestamp`           string          // Timestamp from where this state applies (inclusive).
`end_timestamp`             string          // Timestamp to where this state applies (exclusive).
`wiki_db`                   string          // enwiki, dewiki, eswiktionary, etc.
`user_id`                   bigint          // ID of the user, as in the user table.
`user_name`                 string          // Historical user name.
`user_name_latest`          string          // User name as of today.
`user_groups`               array<string>   // Historical user groups.
`user_groups_latest`        array<string>   // User groups as of today.
`user_blocks`               array<string>   // Historical user blocks.
`user_blocks_latest`        array<string>   // User blocks as of today.
`user_registration`         string          // User creation timestamp.
`auto_create`               int             // 1 if the user was created automatically by SUL,
                                            // 0 otherwise.
`caused_by_event_type`      string          // Event that caused this state (create, move, delete or restore).
`caused_by_user_id`         bigint          // ID from the user that caused this state.
`caused_block_expiration`   string          // Block expiration timestamp, if any.
`inferred_from`             string          // If non-NULL, indicates that some of this state's fields
                                            // have been inferred after an inconsistency in the source data.

Changes and known problems

Date Schema version Details Phab


2016/10/06 n/a The dataset contains data for simplewiki and enwiki until september 2016. Still we need to productionize the automatic updates to that table and import all the wikis.