You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
Revision as of 11:19, 14 December 2018 by imported>Addshore (→‎WDCM: another link)
Jump to navigation Jump to search

Documentation for the WMDE analytics activities.


All of WMDEs puppetized analytics stuff can be found in statistics::wmde and subclasses in the WMF puppet repo.

Currently all scripts run under an 'analytics-wmde' user on the stat boxes. Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.

Time series data

Grafana is a frontend for creating queries and storing dashboards using data from Graphite. Docs for the WMF grafana instance can be found @

Our dashboards can be found by looking at our 2 main dashboards:


We periodically backup our dashboards incase someone breaks something unintentionally. The backups can be found @ and can be updated by following the instructions of the README file in that repository.


Graphite is a real-time time series data store and graph renderer. It gets sent data from a variety of Data Sources. The data sources are not limited to those listed.


We have a scripts repo on gerrit. This repository contains all of the regularly run cron jobs for generating data that is sent to graphite. Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.

This repository has 2 branches (generally kept very up to dat with each other):

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet)

Dump Analyzer


This repository contains Java code used to scan the weekly Wikidata JSON dumps and extract information to be fed into graphite for dashboards. A few one off dump processors & other useful things are also kept here.


This repository simply contains a build of the toolkit-analyzer to be deployed in production. This repository has 2 branches:

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet)

Ad-hov Hive queries

Ad-hoc MediaWiki Logging

The WMDE log channel from MediaWiki will be rsynced across to stat boxes.


The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on this Wikitech page.

WDCM System Operation Workflow.