You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

WMDE/Analytics: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Addshore
m (Addshore moved page Analytics/WMDE to WMDE/Analytics)
 
imported>Lucas Werkmeister (WMDE)
(→‎analytics/wmde/scripts repo: specify full code path, it’s not really obvious imho)
 
(5 intermediate revisions by 3 users not shown)
Line 8: Line 8:
Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.
Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.


== Time series data ==
== Grafana ==
 
=== Grafana ===


grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from [[Graphite]].
grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from [[Graphite]].
Line 16: Line 14:


Our dashboards can be found by looking at our 2 main dashboards:
Our dashboards can be found by looking at our 2 main dashboards:
* Wikidata: https://grafana.wikimedia.org/dashboard/db/wikidata
* Wikidata: https://grafana.wikimedia.org/d/000000154/wikidata
* Technical Wishes: https://grafana.wikimedia.org/dashboard/db/team-tcb
* Technical Wishes: https://grafana.wikimedia.org/d/000000288/team-tcb


'''Backups'''
There are also some dashboards not connected to these 2 main dashboards.


We periodically backup our dashboards incase someone breaks something unintentionally.
== analytics/wmde/scripts repo ==
The backups can be found @ https://github.com/wmde/grafana-dashboards and can be updated by following the instructions of the README file in that repository.
 
=== Graphite ===
 
[[Graphite]] is a real-time time series data store and graph renderer. It gets sent data from a variety of [[Graphite#Data_sources|Data Sources]]. The data sources are not limited to those listed.
 
==== Scripts ====


We have a [https://gerrit.wikimedia.org/r/#/admin/projects/analytics/wmde/scripts scripts repo on gerrit].
We have a [https://gerrit.wikimedia.org/r/#/admin/projects/analytics/wmde/scripts scripts repo on gerrit].
Line 34: Line 25:
Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.
Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.


This repository has 2 branches (generally kept very up to dat with each other):
This repository has 2 branches (generally kept very up to date with each other):
* master - Development code
* master - Development code
* production - Deployed code (merges here will trigger a deploy by puppet)
* production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)
 
These scripts currently run on stat1007 (dictated by puppet) using systemd timers and run as the user 'analytics-wmde'.
Code can be found in <code>/srv/analytics-wmde/graphite/src/scripts</code>.
In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.
 
=== Logs ===


==== Dump Analyzer ====
logs by default are only on journald, that keeps them on tmpfs (so basically on ram, they are wiped if we reboot) unless instructed otherwise in puppet (namely the systemd::timer config setting logging etc..)
 
for the moment journald cannot be accessed by regular users, it needs sudo
 
If you need access to the logs you can ping folks in #wikimedia-analytics
 
== analytics/wmde/toolkit-analyzer repo and build ==


'''toolkit-analyzer'''
'''toolkit-analyzer'''
Line 50: Line 53:
This repository has 2 branches:
This repository has 2 branches:
* master - Development code
* master - Development code
* production - Deployed code (merges here will trigger a deploy by puppet)
* production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)
 
The build analyzer runs on stat1007 (dictated by puppet) and runs as the user 'analytics-wmde'.
Code can be found in /srv/analytics-wmde.
In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.


== Ad-hov Hive queries ==
== Ad-hoc Hive queries ==


* [[Analytics/Systems/Cluster/Hive/Queries/Wikidata]]
* [[Analytics/Systems/Cluster/Hive/Queries/Wikidata]]
Line 61: Line 68:


== WDCM ==
== WDCM ==
The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on [[Wikidata Concepts Monitor|this Wikitech page]].
The [[Wikidata Concepts Monitor]] (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on [[Wikidata Concepts Monitor|this Wikitech page]].
* [https://gerrit.wikimedia.org/r/#/admin/projects/analytics/wmde/WDCM Gerrit]
* [https://gerrit.wikimedia.org/r/#/admin/projects/analytics/wmde/WDCM Gerrit]
* [[phab:diffusion/AWCM/|Diffusion]]
* [[phab:diffusion/AWCM/|Diffusion]]
* [http://wdcm.wmflabs.org/ WDCM Dashboards]
* [http://wdcm.wmflabs.org/ WDCM Dashboards]
[[File:WDCM System Operation Workflow.png|none|thumb|800x800px|WDCM System Operation Workflow.]]
[[File:WDCM System Operation Workflow.png|none|thumb|800x800px|WDCM System Operation Workflow.]]

Latest revision as of 08:54, 8 June 2022

Documentation for the WMDE analytics activities.

Puppetization

All of WMDEs puppetized analytics stuff can be found in statistics::wmde and subclasses in the WMF puppet repo.

Currently all scripts run under an 'analytics-wmde' user on the stat boxes. Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.

Grafana

grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from Graphite. Docs for the WMF grafana instance can be found @ Grafana.wikimedia.org.

Our dashboards can be found by looking at our 2 main dashboards:

There are also some dashboards not connected to these 2 main dashboards.

analytics/wmde/scripts repo

We have a scripts repo on gerrit. This repository contains all of the regularly run cron jobs for generating data that is sent to graphite. Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.

This repository has 2 branches (generally kept very up to date with each other):

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)

These scripts currently run on stat1007 (dictated by puppet) using systemd timers and run as the user 'analytics-wmde'. Code can be found in /srv/analytics-wmde/graphite/src/scripts. In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.

Logs

logs by default are only on journald, that keeps them on tmpfs (so basically on ram, they are wiped if we reboot) unless instructed otherwise in puppet (namely the systemd::timer config setting logging etc..)

for the moment journald cannot be accessed by regular users, it needs sudo

If you need access to the logs you can ping folks in #wikimedia-analytics

analytics/wmde/toolkit-analyzer repo and build

toolkit-analyzer

This repository contains Java code used to scan the weekly Wikidata JSON dumps and extract information to be fed into graphite for dashboards. A few one off dump processors & other useful things are also kept here.

toolkit-analyzer-build

This repository simply contains a build of the toolkit-analyzer to be deployed in production. This repository has 2 branches:

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)

The build analyzer runs on stat1007 (dictated by puppet) and runs as the user 'analytics-wmde'. Code can be found in /srv/analytics-wmde. In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.

Ad-hoc Hive queries

Ad-hoc MediaWiki Logging

The WMDE log channel from MediaWiki will be rsynced across to stat boxes.

WDCM

The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on this Wikitech page.

WDCM System Operation Workflow.