You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Data Lake/Mediawiki user history: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Milimetric
 
imported>MarcoAurelio
m (Bot: Fixing double redirect to Analytics/Data Lake/Edits/Mediawiki user history)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
This page describes the data set that stores the '''user history''' of WMF's wikis. It lives in Analytic's Hadoop cluster and is accessible via a Hive/Beeline external table. For more detail of the purpose of this data set, please read [[Analytics/Data Lake/Page and user history reconstruction]]. Also visit [[Analytics/Data access]] if you don't know how to access this data set.
#REDIRECT [[Analytics/Data Lake/Edits/Mediawiki user history]]
 
=== Schema ===
<syntaxhighlight>
 
`start_timestamp`          string          // Timestamp from where this state applies (inclusive).
`end_timestamp`            string          // Timestamp to where this state applies (exclusive).
`wiki_db`                  string          // enwiki, dewiki, eswiktionary, etc.
`user_id`                  bigint          // ID of the user, as in the user table.
`user_name`                string          // Historical user name.
`user_name_latest`          string          // User name as of today.
`user_groups`              array<string>  // Historical user groups.
`user_groups_latest`        array<string>  // User groups as of today.
`user_blocks`              array<string>  // Historical user blocks.
`user_blocks_latest`        array<string>  // User blocks as of today.
`user_registration`        string          // User creation timestamp.
`auto_create`              int            // 1 if the user was created automatically by SUL,
                                            // 0 otherwise.
`caused_by_event_type`      string          // Event that caused this state (create, move, delete or restore).
`caused_by_user_id`        bigint          // ID from the user that caused this state.
`caused_block_expiration`  string          // Block expiration timestamp, if any.
`inferred_from`            string          // If non-NULL, indicates that some of this state's fields
                                            // have been inferred after an inconsistency in the source data.
</syntaxhighlight>
 
=== Changes and known problems ===
{| class="wikitable"
!Date
!Schema version
!Details
!Phab
Task
|-
|2016/10/06
|n/a
|The dataset contains data for simplewiki and enwiki until september 2016. Still we need to productionize the automatic updates to that table and import all the wikis.
|
|}

Latest revision as of 19:00, 13 July 2017