You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

PAWS/Tools

From Wikitech-static
Jump to navigation Jump to search

Background

Tools is used for a lot of things, but they mostly fall into these buckets:

  1. Interactive bots / one-off jobs (run as non-continuous jobs with people monitoring output via tailing err logs)
  2. Continuous daemons / continuous jobs (run as continuous jobs, do things like react to IRC / rcstream, sleep for a while and do stuff, etc)
  3. Cron-run non-interactive bots (bots that run at set times doing set things, usually have a -once set (or should have it set!))
  4. Multiple continuous daemons that work together (like wikibugs, which is 2 daemons communicating via redis)
  5. Simple web services (stateless services that do not interact with or care about the grid)
  6. WebService + Worker (Quarry, ORES, etc)
  7. Web services that interact with the grid (either by submitting jobs, or having queues running on the grid they communicate with)

Quarry

Another use it used to have before was for people to run ad-hoc SQL queries against the databases. People would ssh to a bastion, and just run it in a screen (messing with bastion resources) or attempt to run it on the grid and get super frustrated (because grid). quarry.wmflabs.org was a solution that 'fixed' it, and the amount of people who were even trying to run ad-hoc queries against labsdb interactively is basically 0 right now. ({{cn}} I see mysql running on tools-bastion pretty regularly, but maybe by a small number of "die hards")

(at least nobody asks about it in support channels (and they used to a lot), so it's the die hards (but needs stronger citation, yes) - also Quarry has time limits, mysql on tools does not)

PAWS

PAWS originally stood for Pywikibot: A Web Shell (it is now PAWS: A Web Shell). It was to basically allowed people to do #1 in a much easier way. Pywikibot is the most (widely? commonly? number of edits I'm not sure - has the biggest actual community +1) used bot framework on wikimedia wikis, and we also have additional OAuth integration to make managing secrets easy. This would move off interactive bot usage off tools onto PAWS, and also more importantly allow more people to use it.

Escalation

Notebooks are far more powerful than just for bots - they can actually easily help do #1, #2, #3, #5 in the list above as well. In addition, it also makes research using open data / wikipedia/wikimedia data super easy, and also makes the whole setup far more accessible to people - it is a web application rather than ssh based, and so usable by people who have not paid the Command Line Tax(TM). There is also a very active upstream community, where I've a lot of patches merged - and they are also super interested in what we are doing. (some notes in http://www.harihareswara.net/libreplanet-2016-inessential-weirdnesses-in-free-software.txt that may be useful related to this point of ease of use)

We ran a workshop https://meta.wikimedia.org/wiki/User:EpochFail/CSCW_2016_report#The_Workshop at CSCW 2016, which was incredibly well received. To a large mass of people this was opening up computation access in a way they could not before. There's a course (being planned by Jonathan Morgan) for using this to teach Data Science to people at University of Washington starting end of March.

I have been demoing this to people for the last few months, and all of research basically got sold on it - Open Research Infrastructure (basically PAWS) became their #1 strategic priority for next fiscal year.

Use cases

User personas

Interactive bot operators

These would be users primarily using the terminal inside PAWS to run interactive bots just following the pywikibot manuals. They don't really use notebooks at all, and treat it just as an ssh replacement. Good pywikibot integration + usable shell is what these people will need.

This will be the first concrete use case we cater to.

Exploratory Research

'Research' just means 'I have a question and want to find an answer to it' - has no implications about programming knowledge or academic credentials. Experience with Quarry and workshops (and soon, classes - http://wiki.communitydata.cc/DS4UX_(Spring_2016) is a University of Washington course targetted at people without too much 'computer programming experience' using Quarry + PAWS) suggests that removing accidental complexities of doing programming (ssh, the grid, screen, jsub defaulting to precise while bastion is trusty, etc) will significantly increase this population. We'll have a more clear idea and narrative of who these people are and what they are doing in a few months! This does include actual researchers and heavy power users too.

These will need some form of ability to share / fork notebooks with each other, access to replica dbs, lots of examples and other help setups, easy access to visualization tools, simple ways to access the various APIs, etc, etc, etc. This is the primary group at which notebooks are initially targetted. We'll be working a lot with upstream to make sure all the things we use / build are generalized and usable by everyone (no 'let us hack a PaaS on top of this old abandonware' again :D)

Things that currently run on Tools but don't need to

This is more long term...

1. Bots written as notebooks that are run on a cron. Since all notebooks are public (and forkable), this allows easy verification of status and what not by random lay-users, and also provides safeguards against abandonment of crucial tools.

2. Simple webservices written as notebooks. Has same advantages as above for public code visibility!

Timeline

Apr - July 2016

- Stabilize underlying kubernetes infrastructure

- Enough fixes to make the Pywikibot Interactive usage use case rock solid.

- Rudimentary public sharing interface + decide on norms for public/private usage

- Decide on Licensing strategy

- Announce publicly