You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/Dashiki

From Wikitech-static
< Analytics‎ | Systems
Revision as of 13:56, 7 April 2017 by imported>Milimetric (Milimetric moved page Analytics/Dashiki to Analytics/Systems/Dashiki: Reorganizing documentation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Dashiki is a dashboarding tool that tries to help you think about information architecture first, plots second. It is very light and performant and requires no setup other than a webserver to host files. Dashiki is not a tool to do data analysis, its goal is to provide a overall view of a system. See, for example, how you can browse pageviews for ALL mediawiki projects on this dashiki installation: Pageviews for all mediawiki projects


Quickstart

1. Download dashiki

 git CLONE https://gerrit.wikimedia.org/r/analytics/dashiki 

2. Follow README instructions to get a local webserver through which to serve the files.

3. You are set, you should be seeing some data on: http://localhost:<your port>/dist/


Background

Dashiki's history and technical stack: File:The Dashboarding Problem.compressed.pdf

Intro

Dashiki has multiple "layouts" which are configured into dashboards via wiki pages

Here are two examples of dashiki dashboards layouts: https://vital-signs.wmflabs.org/ and https://edit-analysis.wmflabs.org/compare/ by "layouts" we mean a certain way to combine metrics and "parameters" like wikis, dates, etc. to present some data

The "layouts" we support right now are:

  1. metrics-by-projects (example is vital signs, good for wiki centric projects)
  2. compare (dashiki, example is the comparison of Visual Editor to Wikitext data, good for things like A/B testing)


FAQ

Where is the code?

You can browse code on github here. We use gerrit to manage changes. Setting dashiki is real simple and you should do that before you read further.

Where is code deployed?

Dashiki instances are listed in deployment configuration: [1]

How do I put dashiki out of service?

Modifying this configuration to set outofservice to "true" will make a banner appear in dashiki instances noting that an outage is going on: https://meta.wikimedia.org/wiki/Dashiki:OutOfService

Deployment

It uses fabric from this change onwards: https://gerrit.wikimedia.org/r/#/c/259437/

How to get your data available via http

As long as you're sure they're OK to be shared publicly, from stat1002 you can put files into /a/aggregate-datasets and those will end up on http://datasets.wikimedia.org/aggregate-datasets after an rsync.

It's up to you to organize the cron that gets them there and make a nice folder hierarchy rsync is hourly.

If you want to store your data in a database you can use the staging db available of analytics store, if you have permits to analytics store you should be able to create tables on that database.

Can I implement a new layout?

You can implement new layouts but the main idea is that we do not want to have just plots and plots but rather we want information architecture around a set of plots that lets users infer what is the data available.

What about limn?

The analytics team is not actively developing limn anymore.

Technical Documentation: Understanding Dashiki - First Steps

(Author contact: Helen, at hjiang(at)wikimedia.org)

Continuous editing and addition to be followed.

Preface

You should already have npm installed if you have used JavaScript. Some key dependencies you need to run Dashiki are: glup, karma, bower. If you use other JS 

visualization libraries such as d3 or dygraph, make sure those are installed as well. Dashiki has lightweight access to major JS visualization libraries, but this doesn't 

mean that we can slack off on due diligence :)  Much of it will be covered in the first MWE.

Overview

Dashiki is a client-side dashboard builder by the Analytics teams, which means that you can use whichever server you fancy. It also has a component system and clear patterns, which let you use pretty much any visualization library and data sources. The tests are run with Karma. Default view port is 5000.

All those lingo aside, let’s take a look at the organization and basic use of Dashiki from /src/, and then dig into its more detailed usage and testing from /test..

From /src, there are six major organic parts of Dashiki:

  • /src/app
  • /src/components
  • /src/css
  • /src/fonts
  • /src/layouts
  • /src/lib

Because /src/css and /src/fonts are only tangentially related with our purpose, and are mostly boilerplate, we are focusing on the remaining four components. And this introductory guide will walk through each part of the component, first on a high level, then go into more details.

/src/layouts

Dashiki has many native build-in layouts, and we can build our own custom layouts. The components within /src/layouts are:

  • Compare
    • Used in A/B test scenarios. Data can either be combined or compared side by side. Tabs, time ranges, different visualization styles can be used.
  • Metrics-by-project
    • This setting is a navigator to easily find and visualize configured metrics for any WMF-hosted wiki projects.
  • tabs
    • Displaying graphs organized into tabs

/src/component

View components are the part where people pass on parameters and interact with the dashboard, and there are many different floating pieces in this module. However, because knockout is used to separate domain data, data, and view components, it is in fact fairly easy to plug and play your favorite visualization libraries and data sources.

To fully see the picture of what /src/component provides, a detailed breakdown is as the following, and we will explore them further. The names of those components are usually self-explanatory, so I will only annotate when needed.

  • A-b-compare
    • Note: used in comparison for A/B tests. Often used in conjuction with /src/compare and /src/component/visualizers.
    • A-b-compare
    • Compare-stacked-bars
    • Compare-sunbursts
    • Compare-timeseries
  • Annotation-list
    • Note: straightforward as it is named, this is used to add annotations to graphs.
    • Annotation-list
    • Binding
  • Breakdown-toggle
  • Button-group
  • Compare-layout
  • Dropdown
  • Metric-selector
  • Metrics-by-project-layout
  • Out-of-service
  • Project-selector
  • Table-layout
  • Visualizers
    • Dygraphs-timeseries
    • Filter-timeseries
    • Hierarchy
    • Nvd3-timeseries
    • Rickshaw-timeseries
    • Stacked-bars
    • Sunburst
    • Table-timeseries
    • Vega-timeseries
    • Visualizer
    • Wikimetrics
    • Note: dygraphs, nvd3, vega, rickshaw are names of JavaScript(JS) visualization libraries.

/src/app

We have already touched on that knockout is used in Dashiki to separate data, do and view components. View components are discussed in the section immediately preceding this /src/components, and now we switch gears to look at the data and data sources, and their interactions and conversions.

  • Apis
    • Annotation-api
    • Api-finder
    • Aqs-api
    • Config-api
      • Note: default setting and plain-state URL handling are both based on the config api.
    • Dataset-api
    • Wikimetrics
  • Data converters
    • Note: this is mainly used to transform data sources
    • Annotations-data
    • Aqs-api-response
    • Factory
    • Hierarchy-data
    • Separated-values
    • Simple-separated-values
    • Timeseries-data
      • This defines the key class at the heart of Dashiki.  Data is parsed into this format and visualizers all must understand how to read and represent it.  This format carries label, color, and pattern information for each column.  Instances of TimeseriesData can be merged together so that you may combine separate datasets (for example, to compare them).  This is where control over colors and patterns is useful, so combined datasets can still be distinguished visually.
    • wikimetrics-timeseries
  • ko-extensions
    • Common-viewmodes
      • Copy-params
      • Single-select
    • Async-observations
      • Note: used for asynchronous data sync and observations
    • Datepicker-binding
    • Global-bindings
  • Utils
    • Note: each component has multiple related useful functions to apply to them accordingly. For example, “array” has two different sorting functions and one filter function, “datetime” has formatting and timespan functions, etc.
    • Arrays
      • Functions: sortByName, sortBYNameIgnoreCase, filler
    • Colors
      • Functions: category10 (a color scale function originally from d3.js. This is a more lightweight version - you don’t have to import the whole d3 library to do it)
    • Datetime
      • Functions: formatDate(formats to YYYY-MM-DD type), timespan
    • Elements
      • Functions: getBounds
    • Numbers
      • Functions: numberFormatter
    • Strings
      • Functions: parserFromSample
  • Config.js
    • Note: static configuration object. Looks for config files written by the build,
  • Require.config.js
    • Note: It looks for different JS libraries, semantic elements, configurations, bindings, utis, apis, view models, and converters in global scope.
  • Sitematrix
    • Note: get the sitematrix and parsing it. It holds an application scoped cache once it is initiated.
  • Startup.js
    • Note: everything in here is on global scope.

/src/lib

This is a knockout-related part, handling errors, manage states, and logging, etc. Not as important for pure visualization purposes, but should be kept in mind because it is visualization with JS.

  • Knockout-extensions
    • Knockout-table.js
    • Note: this is the table binding plugin for knockout. Works with require.js. Dan made customized wrap for a proper define.
  • Ajax-wrapper.js
    • Note: Can handle custom headers, and handle errors across all requests.
  • Logger.js
    • Note: because Dashiki is client-side only, so the logger only logs client-side errors. This is a static function that is available site wide.
  • Polyfills.js
  • State-manager.js
    • Note: Mutates and translates URL to application state. If the URL has no state, then it falls back to the config api for default setting.
  • Window.js
    • Note: window as a stand alone mode, usually useful in testing.

Dive-in

Here I will walk through a minimal working example(MWE) to illustrate some key usages for Dashiki.

Example of metric definition for the metrics-by-project layout:

{
"definition": "https://meta.wikimedia.org/wiki/Research:Unique_Devices",
"name": "MonthlyUniqueDevices",
"displayName": "Monthly Unique Devices",
"api": "aqsApi",
"breakdown": 
  {
"columns": ["All", "Desktop site", "Mobile site"]
  },

"granularity": "monthly",
"annotations": 
  {
  "host": "meta.wikimedia.org",
  "pageName": "Dashiki:MonthlyUniqueDevicesAnnotations"
  }
}

So let’s take it step by step:

  • definition: link to a wiki that defines the metric in plain English
  • name: has to be unique, used in dashboard configs such as https://meta.wikimedia.org/wiki/Config:VitalSigns
  • displayName: just the friendly name to show in the UI
  • api: by default it uses wikimetrics [1] but you can override it here to any other api like datasets or aqsApi (the api-finder needs to know about the api).
  • breakdown: if you have multiple columns per data file, you can specify them here as breakdowns
  • granularity: used for aqsApi only, to specify the time granularity
  • annotations: link to a wiki page that defines annotations for this metric, for example https://meta.wikimedia.org/wiki/Dashiki:MonthlyUniqueDevicesAnnotations

Note: In addition to install gulp, you also need to install bower globally, then install bower locally in your directory.

To install bower globally,

npm install -g bower

To install bower locally, cd to your directory, and:

bower install

If being asked for jQuery version, choose version 2.1.1 to be compatible with Dashiki. No need to specify bower version - you will most likely be prompted to specify jQuery version anyway.

Let’s look at a datasets example:

{
"definition": "https://meta.wikimedia.org/wiki/Analytics/Metrics/Uploads",
"name": "Uploads",
"metric": "multimedia-health",
"submetric": "uploads",
"api": "datasets",
"annotations": 
  {
"host": "meta.wikimedia.org",
"pageName": "Dashiki:MultimediaHealthUploads"
  }
}

The new properties are:

  • “metric”: the top level folder to look for data in, under
  • “submetric”: the subfolder to look for data in

So, using the datasets api, you will have to specify a “metric” and “submetric” property that will be used to look for your data in a convention-based path [2].

Example building your own dashboard locally for testing with gulp:

gulp --layout metrics-by-project --config VitalSigns
python -m SimpleHTTPServer 5000

Browse to http://localhost:5000/dist/metrics-by-project-VitalSigns

Alternatively, browse to localhost:5000, from the /dist directory, navigate to the dashboard.

This is a minimal working example from an existing config page[3] on meta Wiki.

Be sure to write out your config pages CORRECTLY. The page layout is important!

[1] https://github.com/wikimedia/analytics-dashiki/blob/master/src/app/apis/api-finder.js#L25

[2] https://datasets.wikimedia.org/limn-public-data/metrics/{metric}/{submetric}/{wiki}.tsv

[3]https://meta.wikimedia.org/wiki/Config:VitalSigns

Another more sophisticated MWE is the following, with JS being in place: