You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/Matomo: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Neil P. Quinn-WMF
(Copyedit. Consolidate info under the name Matomo. Explain when to use in the intro.)
 
imported>Btullis
 
(7 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''[https://matomo.org/ Matomo]''' (formerly known as '''Piwik''') is a web analytics platform which we use for microsites (roughly 10,000 requests per day or less). Our production instance can be reached at [https://piwik.wikimedia.org https://piwik.wikimedia.org]. it has two layer authentication, a first one with LDAP credentials and another one with a Matomo specific user/password. 
#REDIRECT [[Data Engineering/Systems/Matomo]]
==Access==
You need a Wikitech login that is in the <code>wmf</code> or <code>nda</code> [[LDAP/Groups|LDAP groups]]. If you don't have it, please create a task like https://phabricator.wikimedia.org/T160662
 
Before requesting access, please make sure you:
 
* have a functioning Wikitech login. Get one: https://toolsadmin.wikimedia.org/register/ are an employee or contractor with wmf OR have signed an NDA.
* Depending on the above, you can request to be added to the <code>wmf</code> group or the <code>nda</code> group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the SRE on duty.
 
After the LDAP login there is a second login form that actually doesn't need to be there but cannot be easily removed. To log in, use the username <code>design</code> and password <code>design</code>.
 
== How to instrument ==
Piwik does some tracking out of the box like counting pageviews and unique devices, you can instrument further using piwik's (now called matomo) api:
 
https://developer.matomo.org/api-reference/tracking-javascript
 
= Administration =
 
== When team requests a piwik beacon ==
 
* Go to piwik and login with admin user
* Click Settings
* Websites -> Manage -> Add Site
 
Adding a site will create some tracking code like:
 
<!-- Matomo -->
<script type="text/javascript">
var _paq = _paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
  var u="//piwik.wikimedia.org/";
  _paq.push(['setTrackerUrl', u+'piwik.php']);
  _paq.push(['setSiteId', '19']);
  var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
  g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
})();
</script>
 
* Enable piwik user to  see the site (password in stat1007 on /home/nuria, this is a regular user, not an admin one)
 
* Done
 
Once snippet is in place visits will come in (reports are run once a day)
 
== Rerun reports for all websites ==
php /usr/share/matomo/console core:archive --force-all-websites  --force-all-periods=86400 (for websites that had visits in the last day)
 
== Invalidate old reports ==
It happened in the past that the daily archiver cron (responsible to generate daily/monthly/yearly stats for any Piwik domain) skipped days of data, ending up in reports like https://phabricator.wikimedia.org/T188559 (data collected but not archived, so flat graphs). There is a quick way to force Piwik to rerun its archival process over past data, namely invalidating it:
<syntaxhighlight lang="bash">
elukey@bohrium:/var/log/matomo for el in {20..28}; do sudo -u www-data /usr/share/matomo/console core:invalidate-report-data --dates=2018-02-$el --sites=3; done
Invalidating day periods in 2018-02-20 [segment = ]...
Invalidating week periods in 2018-02-20 [segment = ]...
Invalidating month periods in 2018-02-20 [segment = ]...
Invalidating year periods in 2018-02-20 [segment = ]...
Invalidating day periods in 2018-02-21 [segment = ]...
Invalidating week periods in 2018-02-21 [segment = ]...
Invalidating month periods in 2018-02-21 [segment = ]...
Invalidating year periods in 2018-02-21 [segment = ]...
Invalidating day periods in 2018-02-22 [segment = ]...
[..cut..]
</syntaxhighlight>In this example data from 20/02/2018 to 28/02/2018 has been invalidated via Piwik's console for website id 3 (currently iOS).
 
== Tuning ==
 
We had an expected performance problem while tracking a larger website, which we fixed with their adviceː http://piwik.org/docs/setup-auto-archiving/
 
The cron we set up with this technique is:
 
<pre>
root@matomo1001:/var/log/apache2# crontab -u www-data -l
# HEADER: This file was autogenerated at 2017-05-05 12:01:17 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: piwik_archiver
MAILTO=analytics-alerts@wikimedia.org
0 8 * * * [ -e /usr/share/matomo/console ] && [ -x /usr/bin/php ] && nice /usr/bin/php /usr/share/matomo/console core:archive --url="piwik.wikimedia.org" >> /var/log/matomo/matomo-archive.log
</pre>
 
== Known outages ==
* Nov 23rd 2017: due to a Ganeti failure (more details in https://phabricator.wikimedia.org/T181121) the bohrium virtual machine (running Piwik and its mysql database) got stopped in a non graceful way, ending up in a InnoDB table corruption. We had to restore the last mysql backup happened on Nov 22, so almost all the data related to Nov 23 has not been recorded.
*June 27th 2018: 8 hours of downtime to upgrade the Piwik database to the new version - T192298
*June 28th 2018: 1 hour of downtime to upgrade Piwik to Matomo
*October 4th 2018: 1 hour of downtime to move Matomo/Piwik from bohrium to matomo1001 (new host).
*October 5th 2018: 1 hour of downtime to fix a database issue.
*December 5th 2018: 30 mins of downtime to upgrade to 3.7.0

Latest revision as of 14:26, 12 January 2023