You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Analytics/Data Lake/Schemas/Metric results: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Joal
m (Joal moved page Analytics/Data Lake/Metric results to Analytics/Data Lake/Schemas/Metric results: Organizing doc before first internal production release.)
 
imported>Joal
(Update for first internal productionisation.)
Line 1: Line 1:
=Overview=
=Overview=


This table is dynamically partitioned on wiki_db and metric and holds metric results per wiki per time period.  So an example row would beː (enwiki, daily_edits, 2012-09-10, 12345).  As a result of the dynamic partitioning, inserting data into this table creates separate directories with single files for each wiki and metric.  This allows the files to be easily copied to datasets.wikimedia.org for display in dashiki dashboards.
This table stores metric computed over the [[Analytics/Data Lake/Schemas/Mediawiki history|denormalized mediawiki history]] dataset. It is [https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-DataUnits partitioned] by wiki_db and metric name to facilitate using its data outside of Hive, namely for display in Dashiki.


=Schema=
=Schema=
<syntaxhighlight>


<pre>
col_name data_type comment
CREATE EXTERNAL TABLE `wmf.mediawiki_metric`(
dt                 string             The date of this measurement, as YYYY-MM-DD
  `dt`      string COMMENT 'The date of this measurement, as YYYY-MM-DD',
value               bigint             The measurement    
  `valuebigint COMMENT 'The measurement'
snapshot            string              Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
)
metric              string              The metric being computed to measure
COMMENT
wiki_db             string             The wiki this measurement pertains to
  'See most up to date documentation at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Metric_results'
PARTITIONED BY
# Partition Information
(
# col_name            data_type          comment           
    `wiki_dbstring COMMENT 'The wiki this measurement pertains to',
    `metric`    string COMMENT 'The metric being computed to measure'
snapshot            string              Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
)
metric             string             The metric being computed to measure
</pre>
wiki_db            string              The wiki this measurement pertains to
 
</syntaxhighlight>

Revision as of 14:39, 28 March 2017

Overview

This table stores metric computed over the denormalized mediawiki history dataset. It is partitioned by wiki_db and metric name to facilitate using its data outside of Hive, namely for display in Dashiki.

Schema

col_name	data_type	comment
dt                  	string              	The date of this measurement, as YYYY-MM-DD
value               	bigint              	The measurement     
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to