You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Analytics/Systems/Superset"

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Neil P. Quinn-WMF
(Use newly-standard naming for developer usernames)
Line 2: Line 2:


== Access ==
== Access ==
You need a wikitech login that is in the "wmf" or "nda" [[LDAP/Groups|LDAP groups]]. If you don't have it, please create a task like [https://phabricator.wikimedia.org/T160662 T160662].
To access Superset, you need <code>wmf</code> or <code>nda</code> LDAP access. For more details, see [[Analytics/Data access#LDAP access]].


Before requesting access, please make sure you:
If you have that access, you can log in at [https://superset.wikimedia.org superset.wikimedia.org] with your developer shell username and password.
* have a functioning Wikitech login. Get one: [https://toolsadmin.wikimedia.org/register/ https://toolsadmin.wikimedia.org/register/]
* are an employee or contractor with wmf OR have signed an NDA
Depending on the above, you can request to be added to the wmf group or the nda group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the Opsen on duty.
 
Once you are in either the wmf or nda LDAP groups, <s>Superset will automatically create an account for you.  Just visit https://superset.wikimedia.org.</s>  ('''NOTE''': As of September 24, 2018, there is a bug with Superset that means that user accounts are not automatically created.  This bug manifests as a stack trace with the message "AttributeError: 'bool' object has no attribute 'login_count'" when attempting to access Superset.  If you see this error, request that the Analytics team create your user for you.)


== Usage notes ==
== Usage notes ==
Line 20: Line 15:


== Administration ==
== Administration ==
=== Account Creation ===
Account creation should be handled automatically the first time a user logs into superset.  They need to be in the wmf or nda LDAP group, and should log in using their shell username and LDAP password.
=== Upgrading ===
=== Upgrading ===
To upgrade, first follow the instructions in the [https://github.com/wikimedia/analytics-superset-deploy analytics/superset/deploy README] to update the deploy repository.  Once deployed, activate the superset virtualenv, add /etc/superset to PYTHONPATH (to allow superset to pick up configuration) and follow the [https://superset.incubator.apache.org/installation.html#upgrading Superset upgrade instructions] (minus the <tt>pip install superset --upgrade</tt> part).  This should be something like:
To upgrade, first follow the instructions in the [https://github.com/wikimedia/analytics-superset-deploy analytics/superset/deploy README] to update the deploy repository.  Once deployed, activate the superset virtualenv, add /etc/superset to PYTHONPATH (to allow superset to pick up configuration) and follow the [https://superset.incubator.apache.org/installation.html#upgrading Superset upgrade instructions] (minus the <tt>pip install superset --upgrade</tt> part).  This should be something like:

Revision as of 15:13, 8 January 2020

Superset is an Apache incubator project, originally started at AirBnB. It enables visualizations and dashboards built from various analytics data sources. WMF's Superset instance can be found at https://superset.wikimedia.org. Like Turnilo, it provides access to various Druid tables.

Access

To access Superset, you need wmf or nda LDAP access. For more details, see Analytics/Data access#LDAP access.

If you have that access, you can log in at superset.wikimedia.org with your developer shell username and password.

Usage notes

  • The "Druid Datasources" list shows ingested tables that are available for querying. As of October 2018, this includes e.g. daily and hourly pageviews data (the daily version is only updates once a month, but goes further back), a sampled excerpt of webrequest data, unique devices, and a few select EventLogging schemas. If a recently created Druid datasource is not yet visible in the list, try clicking "Scan New Datasouces".
  • NULL values don't show up properly in the values selection dropdown list for filters (i.e. one can't use that dropdown to exclude NULL values from a chart or limit it to NULL values). But one can use the regex option instead: Type in ".+" (without the quotes), and accept the offer to create that as an option.
  • By default, always use predefined SUM metrics when available. When choosing a metric then picking the SUM aggregation function, the aggregation is managed by superset and uses the floatSum operator. This operator uses 32 bits floats instead of 64 bits longs or double, leading to inaccuracies. Usually predefined SUM(...) metrics are available and should be used, as they are manually defined using doubleSum or longSum 64 bits operators.

...

Administration

Upgrading

To upgrade, first follow the instructions in the analytics/superset/deploy README to update the deploy repository. Once deployed, activate the superset virtualenv, add /etc/superset to PYTHONPATH (to allow superset to pick up configuration) and follow the Superset upgrade instructions (minus the pip install superset --upgrade part). This should be something like:

. /srv/deployment/analytics/superset/venv/bin/activate
export PYTHONPATH=/etc/superset
superset db upgrade
superset init

Deploy

This assumes that one has already filed the change for the superset deploy repository. The first thing to do is test the change on the staging instance, an-tool1005.eqiad.wmnet:

# ssh to deploy1001 and set the working directory
ssh deploy1001.eqiad.wmnet
cd /srv/deployment/analytics/superset/deploy

# create a new branch from the master one, name it as you prefer
git checkout -B testing_something_important

# cherry pick the change in the new branch created
git cherry-pick $change-from-gerrit

# deploy only to an-tool1005, without logging in the ops's sal
scap deploy --no-log-message -f -l an-tool1005.eqiad.wmnet "Test deployment for something important"

Then check if Superset works as expected:

# Create a ssh tunnel and then test it via localhost:8080 on the browser
ssh -L 8080:an-tool1005.eqiad.wmnet:80 an-tool1005.eqiad.wmnet

If you are happy with the Superset version, then merge and deploy to the production host:

# ssh to deploy1001 and set the working directory
ssh deploy1001.eqiad.wmnet
cd /srv/deployment/analytics/superset/deploy

scap deploy -l analytics-tool1004.eqiad.wmnet "Deployment for something important"

How to

See status

systemctl status superset.service

Bounce

systemctl restart superset

See also