You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

WMDE/Analytics

From Wikitech-static
< WMDE
Revision as of 14:21, 31 October 2018 by imported>Addshore (Addshore moved page Analytics/WMDE to WMDE/Analytics)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Documentation for the WMDE analytics activities.

Puppetization

All of WMDEs puppetized analytics stuff can be found in statistics::wmde and subclasses in the WMF puppet repo.

Currently all scripts run under an 'analytics-wmde' user on the stat boxes. Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.

Time series data

Grafana

grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from Graphite. Docs for the WMF grafana instance can be found @ Grafana.wikimedia.org.

Our dashboards can be found by looking at our 2 main dashboards:

Backups

We periodically backup our dashboards incase someone breaks something unintentionally. The backups can be found @ https://github.com/wmde/grafana-dashboards and can be updated by following the instructions of the README file in that repository.

Graphite

Graphite is a real-time time series data store and graph renderer. It gets sent data from a variety of Data Sources. The data sources are not limited to those listed.

Scripts

We have a scripts repo on gerrit. This repository contains all of the regularly run cron jobs for generating data that is sent to graphite. Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.

This repository has 2 branches (generally kept very up to dat with each other):

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet)

Dump Analyzer

toolkit-analyzer

This repository contains Java code used to scan the weekly Wikidata JSON dumps and extract information to be fed into graphite for dashboards. A few one off dump processors & other useful things are also kept here.

toolkit-analyzer-build

This repository simply contains a build of the toolkit-analyzer to be deployed in production. This repository has 2 branches:

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet)

Ad-hov Hive queries

Ad-hoc MediaWiki Logging

The WMDE log channel from MediaWiki will be rsynced across to stat boxes.

WDCM

The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on this Wikitech page.

WDCM System Operation Workflow.