You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Analytics/Systems/Turnilo: Difference between revisions
(→Deploy: Remove submodules command since we're not using submodules here)
|Line 27:||Line 27:|
Revision as of 21:37, 6 April 2022
Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).
To access Turnilo, you need
nda LDAP access. For more details, see Analytics/Data access#LDAP access.
If you have that access, you can log in at turnilo.wikimedia.org with your Wikitech username and password.
Turnilo is currently (2020-02-26) hosted on
an-tool1007.eqiad.wmnet. It is deployed to
/srv/deployment/analytics/turnilo/deploy by scap. Puppet generates its configuration file in
/etc/turnilo/config.yaml using this puppet template:
/modules/turnilo/templates/config.yaml.erb. If any of this is wrong when you're reading it, you can update it fairly quickly by searching the puppet repository for "turnilo".
sudo systemctl restart turnilo
Everybody can read
The Analytics team can also use journalctl:
sudo journalctl -u turnilo -f
The -f is needed to keep tailing the logs, otherwise feel free to remove it.
Deployment steps for both test and production:
scap deploy --limit an-tool1005.eqiad.wmnet
The code that renders https://turnilo.wikimedia.org is split in two parts:
- an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
- a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/turnilo/deploy repo.
Test config changes
NOTE: if you make config changes, you need to test and restart Turnilo once the puppet change is merged (see above).
- Make sure you can ssh to turnilo's box.
- ps -auxfww on box will tell you the command you need to run, something like:
/usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config /etc/turnilo/config.yaml
- copy yaml file with config to your home directory and change port in which turnilo runs (say you changed it to 9091)
- start a process on box using your local config
- connect via localhost: ssh -N an-tool1007.eqiad.wmnet -L 9091:localhost:9091
Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:
- Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
- use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs). Since then, Caravel was renamed Superset and received considerable development. We are starting to standardize on it for access to our heterogeneous data stores.
- use Pivot, at the time a new open-source tool from Imply.
We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.
In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.