You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Analytics/Systems/Turnilo: Difference between revisions
(Changed redirect target from Analytics/Systems/Visualization/Turnilo to Analytics/Systems/Turnilo-Pivot)
m (Neil P. Quinn-WMF moved page Analytics/Systems/Turnilo-Pivot to Analytics/Systems/Turnilo: No need to use the old name any more)
|Line 1:||Line 1:|
Revision as of 11:10, 25 September 2019
Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.)
Turnilo is a clone of an earlier project called Pivot
Predecessor to Turnilo: Pivot
When it was initially deployed, Pivot was fully open source. A legal dispute caused it to stop being available in an open source fashion, and we have been running the last available version as published on github.com. Following is a description of the reasons that went into choosing Pivot, alternatives, and our choices going forward. In May 2018 we deployed a new fork of Pivot: Turnilo (https://github.com/allegro/turnilo). While it does not add any new features, it seems well maintained and it is certainly faster.
Choosing a User Interface for Druid
Some of the criteria of why did we choose Druid as our datastore is outlined here: Analytics/Systems/Druid#Why_did_we_Choose_Druid._Value_Proposition but the gist of it is that Druid is a very useful tool in that it allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would need to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:
- Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
- use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs)
- use Pivot, at the time a new open-source tool from Imply
We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we have added more datasets to Druid/Pivot bringing a lot of value to PMs and execs. As we were doing that, Pivot source was being closed for legal reasons. The dispute was resolved but pivot was no longer available under Apache 2.0 license as of November 2016. See: announcement for details.
We deployed the last freely open source available version and that's what we've been running from summer 2016 to the present (May 2017). We had a choice to abandon Pivot because it was being closed, but we felt it brought too much value to do that, and that we could fix bugs on the last open source version if we needed to. An active fork seemed outside the resourcing abilities of our team.
Currently, more and more people are using Pivot and loading data into Druid has reduced our time to deliver data products. But with more use cases the bugs in our current version of Pivot became more obvious. There are two major user interface bugs that are blocking things and several dozen little ones. We found out that new versions of Pivot that Imply has been working on fix these bugs and add useful features such as world maps. We are currently working to see if upgrading Pivot is admissible within our fairly strict open-source-only policy.
The makers of Pivot cannot offer us an open source license use to legal reasons but they firmly believers in open source (Druid, their main product, is open source) and offered us a license, that includes usage of the source, for a nominal fee that they later donated to the WMF. We feel the team is strongly committed to open source and they have worked with us to give us the most advantageous license they could give us within their legal requirements.
User Docs and Administration
Before requesting access, please make sure you:
- Have a functioning Wikitech login. Just register an account (user and password) in this wiki: http://wikitech.wikimedia.org
- Are an employee or contractor with wmf OR have signed an NDA, info on how to do that: Volunteer_NDA#Create_a_request.
Depending on the above, you can request to be added to the wmf group or the NDA group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the Opsen on duty.
Once you have a wikitech login please create a task like the following so SRE can give you permits to access: https://phabricator.wikimedia.org/T160662
sudo systemctl restart turnilo
Everybody can read
The Analytics team can also use journalctl:
sudo journalctl -u turnilo -f
The -f is needed to keep tailing the logs, otherwise feel free to remove it.
Deployment steps for deployment.eqiad.wmnet:
git submodule update --init
The code that renders https://turnilo.wikimedia.org is running entirely on analytics-tool1002.eqiad.wmnet and it is split in two parts:
- an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
- a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot/deploy repo (https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot is a submodule).
Test config changes
- Make sure you can ssh to turnilo's box.
- ps -auxfww on box will tell you the command you need to run, something like:
/usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config config.yaml
- copy yaml file with config locally to your home directory and change port in which turnilo runs (say you changed it to 9091)
- start a process on box using your local config
- connect via localhost: ssh -N analytics-some.eqiad.wmnet -L 9091:localhost:9091