You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).
Before requesting access, please make sure you:
- Have a functioning Wikitech login. Just register an account (user and password) in this wiki: http://wikitech.wikimedia.org
- Are an employee or contractor with wmf OR have signed an NDA, info on how to do that: Volunteer_NDA#Create_a_request.
Depending on the above, you can request to be added to the wmf group or the NDA group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the Opsen on duty.
Once you have a wikitech login please create a task like the following so SRE can give you permits to access: https://phabricator.wikimedia.org/T160662
sudo systemctl restart turnilo
Everybody can read
The Analytics team can also use journalctl:
sudo journalctl -u turnilo -f
The -f is needed to keep tailing the logs, otherwise feel free to remove it.
Deployment steps for deployment.eqiad.wmnet:
git submodule update --init
The code that renders https://turnilo.wikimedia.org is running entirely on analytics-tool1002.eqiad.wmnet and it is split in two parts:
- an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
- a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot/deploy repo (https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot is a submodule).
Test config changes
- Make sure you can ssh to turnilo's box.
- ps -auxfww on box will tell you the command you need to run, something like:
/usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config config.yaml
- copy yaml file with config locally to your home directory and change port in which turnilo runs (say you changed it to 9091)
- start a process on box using your local config
- connect via localhost: ssh -N analytics-some.eqiad.wmnet -L 9091:localhost:9091
Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:
- Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
- use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs)
- use Pivot, at the time a new open-source tool from Imply
We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.
In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.