You are browsing a read-only backup copy of Wikitech. The live site can be found at

Analytics/Data Lake/Mediawiki user history: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
(Joal moved page Analytics/Data Lake/Mediawiki user history to Analytics/Data Lake/Schemas/Mediawiki user history: Organizing doc before first internal production release.)
Line 1: Line 1:
This page describes the data set that stores the '''user history''' of WMF's wikis. It lives in Analytic's Hadoop cluster and is accessible via a Hive/Beeline external table. For more detail of the purpose of this data set, please read [[Analytics/Data Lake/Page and user history reconstruction]]. Also visit [[Analytics/Data access]] if you don't know how to access this data set.
#REDIRECT [[Analytics/Data Lake/Schemas/Mediawiki user history]]
=== Schema ===
`start_timestamp`          string          // Timestamp from where this state applies (inclusive).
`end_timestamp`            string          // Timestamp to where this state applies (exclusive).
`wiki_db`                  string          // enwiki, dewiki, eswiktionary, etc.
`user_id`                  bigint          // ID of the user, as in the user table.
`user_name`                string          // Historical user name.
`user_name_latest`          string          // User name as of today.
`user_groups`              array<string>  // Historical user groups.
`user_groups_latest`        array<string>  // User groups as of today.
`user_blocks`              array<string>  // Historical user blocks.
`user_blocks_latest`        array<string>  // User blocks as of today.
`user_registration`        string          // User creation timestamp.
`auto_create`              int            // 1 if the user was created automatically by SUL,
                                            // 0 otherwise.
`caused_by_event_type`      string          // Event that caused this state (create, move, delete or restore).
`caused_by_user_id`        bigint          // ID from the user that caused this state.
`caused_block_expiration`  string          // Block expiration timestamp, if any.
`inferred_from`            string          // If non-NULL, indicates that some of this state's fields
                                            // have been inferred after an inconsistency in the source data.
=== Changes and known problems ===
{| class="wikitable"
!Schema version
|The dataset contains data for simplewiki and enwiki until september 2016. Still we need to productionize the automatic updates to that table and import all the wikis.

Revision as of 12:46, 24 March 2017