You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/Turnilo: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Neil P. Quinn-WMF
(Clarify which username to use)
imported>Milimetric
No edit summary
Line 7: Line 7:


==Administration ==
==Administration ==
Turnilo is currently (2020-02-26) hosted on <code>an-tool1007.eqiad.wmnet</code>.  It is deployed to <code>/srv/deployment/analytics/turnilo/deploy</code> by scap.  Puppet generates its configuration file in <code>/etc/turnilo/config.yaml</code> using this puppet template: <code>[https://github.com/wikimedia/puppet/blob/ef99835a63e71d5a1ebf5fa8c8a191b1c75fc7d4/modules/turnilo/templates/config.yaml.erb /modules/turnilo/templates/config.yaml.erb]</code>.  If any of this is wrong when you're reading it, you can update it fairly quickly by searching the puppet repository for "turnilo".


=== Restart ===
=== Restart ===
  sudo systemctl restart turnilo
  sudo systemctl restart turnilo
=== Logs ===
=== Logs ===
Everybody can read <code>/var/log/pivot/syslog.log</code>
Everybody can read <code>/var/log/turnilo/syslog.log</code>


The Analytics team can also use journalctl:<syntaxhighlight lang="bash">
The Analytics team can also use journalctl:<syntaxhighlight lang="bash">
Line 20: Line 22:
Deployment steps for deployment.eqiad.wmnet:  
Deployment steps for deployment.eqiad.wmnet:  


<code>cd /srv/deployment/analytics/pivot/deploy</code>
<code>cd /srv/deployment/analytics/turnilo/deploy</code>


<code>git pull</code>
<code>git pull</code>
Line 28: Line 30:
<code>scap deploy</code>
<code>scap deploy</code>


The code that renders https://turnilo.wikimedia.org is running entirely on analytics-tool1002.eqiad.wmnet and it is split in two parts:
The code that renders https://turnilo.wikimedia.org is split in two parts:
* an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
* an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
* a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot/deploy repo (https://gerrit.wikimedia.org/r/#/admin/projects/analytics/pivot is a submodule).
* a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/turnilo/deploy repo.


=== Test config changes ===
=== Test config changes ===
Line 36: Line 38:
* ps -auxfww on box will tell you the command you need to run, something like:
* ps -auxfww on box will tell you the command you need to run, something like:


   /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config config.yaml
   /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config /etc/turnilo/config.yaml


* copy yaml file with config locally to your home directory and change port in which turnilo runs (say you changed it to 9091)
* copy yaml file with config to your home directory and change port in which turnilo runs (say you changed it to 9091)
* start a process on box using your local config
* start a process on box using your local config
* connect via localhost: ssh -N analytics-some.eqiad.wmnet -L 9091:localhost:9091
* connect via localhost: ssh -N an-tool1007.eqiad.wmnet -L 9091:localhost:9091


== History ==
== History ==
Line 46: Line 48:


* Pay the folks who develop [https://www.meteorite.bi/products/saiku Saiku] to integrate it with Druid (this never got approved in the budget)
* Pay the folks who develop [https://www.meteorite.bi/products/saiku Saiku] to integrate it with Druid (this never got approved in the budget)
* use [https://github.com/apache/incubator-superset Caravel] (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs)
* use [https://github.com/apache/incubator-superset Caravel] (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs).  Since then, Caravel was renamed Superset and received considerable development.  We are starting to standardize on it for access to our heterogeneous data stores.
* use Pivot, at the time a [https://imply.io/post/hello-pivot new open-source tool from Imply]
* use Pivot, at the time a [https://imply.io/post/hello-pivot new open-source tool from Imply].


We chose Pivot, some [[phab:T136836|feedback was gathered here]]. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: [https://groups.google.com/forum/#!topic/imply-user-group/LaKKgXqWePQ announcement] for details.
We chose Pivot, some [[phab:T136836|feedback was gathered here]]. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: [https://groups.google.com/forum/#!topic/imply-user-group/LaKKgXqWePQ announcement] for details.


In May 2018, we deployed a new fork of Pivot: [https://github.com/allegro/turnilo Turnilo]. While it does not add any new features, it seems well maintained and it is certainly faster.
In May 2018, we deployed a new fork of Pivot: [https://github.com/allegro/turnilo Turnilo]. While it does not add any new features, it seems well maintained and it is certainly faster.

Revision as of 16:09, 26 February 2020

Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).

Access

To access Turnilo, you need wmf or nda LDAP access. For more details, see Analytics/Data access#LDAP access.

If you have that access, you can log in at turnilo.wikimedia.org with your Wikitech username and password.

Administration

Turnilo is currently (2020-02-26) hosted on an-tool1007.eqiad.wmnet. It is deployed to /srv/deployment/analytics/turnilo/deploy by scap. Puppet generates its configuration file in /etc/turnilo/config.yaml using this puppet template: /modules/turnilo/templates/config.yaml.erb. If any of this is wrong when you're reading it, you can update it fairly quickly by searching the puppet repository for "turnilo".

Restart

sudo systemctl restart turnilo

Logs

Everybody can read /var/log/turnilo/syslog.log

The Analytics team can also use journalctl:

sudo journalctl -u turnilo -f

The -f is needed to keep tailing the logs, otherwise feel free to remove it.

Deploy

Deployment steps for deployment.eqiad.wmnet:

cd /srv/deployment/analytics/turnilo/deploy

git pull

git submodule update --init

scap deploy

The code that renders https://turnilo.wikimedia.org is split in two parts:

Test config changes

  • Make sure you can ssh to turnilo's box.
  • ps -auxfww on box will tell you the command you need to run, something like:
 /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config /etc/turnilo/config.yaml
  • copy yaml file with config to your home directory and change port in which turnilo runs (say you changed it to 9091)
  • start a process on box using your local config
  • connect via localhost: ssh -N an-tool1007.eqiad.wmnet -L 9091:localhost:9091

History

Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:

  • Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
  • use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs). Since then, Caravel was renamed Superset and received considerable development. We are starting to standardize on it for access to our heterogeneous data stores.
  • use Pivot, at the time a new open-source tool from Imply.

We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.

In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.