You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Data Lake/Page history: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Mforns
(initial version)
 
imported>Milimetric
Line 1: Line 1:
This page describes the data set that stores the '''page history''' of WMF's wikis. It lives in Analytic's Hadoop cluster and is accessible via a Hive/Beeline external table. For more detail of the purpose of this data set, please read [[Analytics/Data Lake/Page and user history reconstruction]]. Also visit [[Analytics/Data access]] if you don't know how to access this data set.
#REDIRECT [[Analytics/Data Lake/Mediawiki Page history]]
 
=== Schema ===
<syntaxhighlight>
 
`start_timestamp`          string    // Timestamp from where this state applies (inclusive).
`end_timestamp`            string    // Timestamp to where this state applies (exclusive).
`wiki_db`                  string    // enwiki, dewiki, eswiktionary, etc.
`page_id`                  bigint    // ID of the page, as in the page table.
`page_id_artificial`        string    // Generated ID for deleted pages without real ID.
`page_creation_timestamp`  string    // Timestamp of the page's first revision.
`page_title`                string    // Historical page title.
`page_title_latest`        string    // Page title as of today.
`page_namespace`            int      // Historical namespace.
`page_namespace_latest`    int      // Namespace as of today.
`caused_by_event_type`      string    // Event that caused this state (create, move, delete or restore).
`caused_by_user_id`        bigint    // ID from the user that caused this state.
`inferred_from`            string    // If non-NULL, indicates that some of this state's fields have been inferred
                                      // after an inconsistency in the source data.
</syntaxhighlight>
 
=== Changes and known problems ===
{| class="wikitable"
!Date
!Schema version
!Details
!Phab
Task
|-
|2016/10/06
|n/a
|The dataset contains data for simplewiki and enwiki until september 2016. Still we need to productionize the automatic updates to that table and import all the wikis.
|
|}

Revision as of 16:16, 2 December 2016