You are browsing a read-only backup copy of Wikitech. The live site can be found at

Analytics/Data Lake/Page history: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
(initial version)
Line 1: Line 1:
This page describes the data set that stores the '''page history''' of WMF's wikis. It lives in Analytic's Hadoop cluster and is accessible via a Hive/Beeline external table. For more detail of the purpose of this data set, please read [[Analytics/Data Lake/Page and user history reconstruction]]. Also visit [[Analytics/Data access]] if you don't know how to access this data set.
#REDIRECT [[Analytics/Data Lake/Mediawiki Page history]]
=== Schema ===
`start_timestamp`          string    // Timestamp from where this state applies (inclusive).
`end_timestamp`            string    // Timestamp to where this state applies (exclusive).
`wiki_db`                  string    // enwiki, dewiki, eswiktionary, etc.
`page_id`                  bigint    // ID of the page, as in the page table.
`page_id_artificial`        string    // Generated ID for deleted pages without real ID.
`page_creation_timestamp`  string    // Timestamp of the page's first revision.
`page_title`                string    // Historical page title.
`page_title_latest`        string    // Page title as of today.
`page_namespace`            int      // Historical namespace.
`page_namespace_latest`    int      // Namespace as of today.
`caused_by_event_type`      string    // Event that caused this state (create, move, delete or restore).
`caused_by_user_id`        bigint    // ID from the user that caused this state.
`inferred_from`            string    // If non-NULL, indicates that some of this state's fields have been inferred
                                      // after an inconsistency in the source data.
=== Changes and known problems ===
{| class="wikitable"
!Schema version
|The dataset contains data for simplewiki and enwiki until september 2016. Still we need to productionize the automatic updates to that table and import all the wikis.

Revision as of 16:16, 2 December 2016