You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:SRodlund/PAWS examples (staging)
|This page is currently a draft.|
More information and discussion about changes to this draft on the talk page.
This is a staging page for: https://wikitech.wikimedia.org/wiki/PAWS/PAWS_examples_and_recipes
I am currently using this page to compile PAWS resources mainly for Wiki Replicas and API. This is a mix of tutorials, notebooks, and relevant pages.
- Finish pulling together resources for Wiki Replicas
- Update this page to reflect PAWS: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database
To help newcomers and current users of PAWS find exisiting notebooks and tutorials that can serve as models for their own work.
- I am not a developer; I want to perform some basic technical tasks to help improve the wikis I am working on.
- I am not a developer; I am interested in Wikidata and want to have more skills that allow me to work with it.
- I am not a developer; I would like to gain some Python programming skills that I can use to contribute to Wikimedia technical projects but will be applicable to other projects outside of Wikimedia
- I am a developer; and I am looking for additional examples of notebooks that I can use as a basis for my own.
- I am a researcher and want to access database replicas and work with datasets.
Note: potential audiences in bold
- If this is a single page, I would suggest sorting this page by task type and/or audience (label with libray/package ie Pywikibot, numPY, etc).
- Consider making this a "mini" landing page in the PAWS portal. You could branch out to pages for separate audiences or tasks (though this may be complicating the lightswitch).
Examples to draw from
- https://www.mediawiki.org/wiki/Wikimedia_tutorials -- I like the box layout, though this is not quite intuitive enough. I feel like there should be some sort of menu/key.
- https://meta.wikimedia.org/wiki/Research:Data -- I like the nav buttons and also the quick glance box on this page.
- https://wikitech.wikimedia.org/wiki/Main_Page -- something based around the Wikitech portal design also used for PAWS and Blubber landing pages
- https://meta.wikimedia.org/wiki/Small_wiki_toolkits -- I like the box layout here
This page is work in progress and will be developed further.
This page offers a growing number of recipes, how-tos, and example notebooks that you may find useful while learning and exploring PAWS. This page is not meant to be an exhaustive list. There are many examples of public notebooks available in many places. To see all of the notebooks currently hosted on PAWS, check out the public index.
There is currently no way to search the public index for specific types of notebooks, but it can be useful to explore them to see what others have done with PAWS.
While the following information is separated by areas of interest, it should be noted that many notebooks utilize a variety of these elements at once.
A notebook may use the Pywikibot library, employ API connections, and utilize wiki replicas.
A visual key to help keep track of what examples and tutorials are available
Areas of Interest
Following are resources to help you get started with PAWS.
Notebook based tutorials
- Getting started with PAWS tutorial
- PAWS Cheatsheet This "cheatsheet" contains a number of useful tasks you can run right away. Note to self: break this out into small tutorials based around the info inside of it.
Wiki replicas (Databases)
Resources for this section
- Downloading a result set - can be useful in PAWS notebooks
- Help:Toolforge/Database - Documentation on Toolforge Databases
- Help:MySQL_queries - Information on constructing MySQL queries
- Wiki_replicas - Pointers to different resources on Wiki replicas
Tutorials and helpers
- JMos cheatsheet - This has a Database Connections section that is very useful and probably deserves its own tutorial notebook (Note to self -- create a separate notebook just from this section. It's hard to find just based on the title of the cheatsheet, there's no TOC in the doc, and some of the information may be out of date.).
- Replica helper - This is a importable notebook that provides simple helpers for performing queries on the labsdb replica databases from PAWS. It is stateful and designed to be easy to use in an interactive setup.
- Accessing Database Replicas With Pandas and Sqlalchemy Pandas is a lovely high level library for in-memory data manipulations. In order to get the result of a SQL query as a pandas dataframe use the code provided here.
- A notebook that uses SQL to compile a list of US place names
- A notebook that uses the replica database helper to explore at namespace edits
- Short example of how to access the replica databases from PAWS.
- A notebook exploring cross-wiki JOINS.
- Test notebook connecting to databases
- Create table with Wikidata SPARQL -- a little different from the other examples
- Replica databases helper (Needs Testing)- This is a importable notebook that provides simple helpers for performing queries on the labsdb replica databases from PAWS. It is stateful and designed to be easy to use in an interactive setup.
- Search Wikipedia articles - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API search endpoints to search for articles about the Solar System on English Wikipedia.
- Exploring page history - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on English Wikipedia.
- Wikimedia Feeds API Intro - Many Wikipedias include daily featured articles and other curated content on their homepages. You can see an example of this content on the main page of English, German, and French Wikipedias. The Wikifeeds API lets you access this content programmatically and add high-quality, multilingual content to your apps.
Other ares of interest
Notebooks that use multiple Python libraries
- A machine learning notebook with visualizations. Multiple Python libraries are imported to create this notebook.
- A notebook using Wikidata to list painters in multiple languages.
- Using Pywikbot with PAWS - A basic introduction to using Pywikibot with PAWS. This tutorial gives you the information you need to get started using a Python 3 notebook or the PAWS terminal.
- An intro to Pywikibot - A notebook based Pywikibot tutorial
Example notebooks (This section will likely change)
Notebooks that use datasets
Notebooks that use Pywikibot
Understand users and user behavior on a wiki
- See the global block history for a user across wikis
- Pages created from external links by non-autoconfirmed users. This can be used to reduce spam on wikis.
Make it easier for editors to organize articles and information
- Extract information about stubs for editors who are considering merging them This example uses the catgory "Rural localities in Russia."
- Search for pages with deprecated templates
Contribute to Wikidata
Answer interesting questions
- Teahouse questions -- What kinds of questions do Teahouse users ask? This notebook uses Pywikibot and matplotlib.pyplot to find out.
Further resources and useful pages
- PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services. The existing Jupyter Notebooks documentation is an excellent resource for PAWS users.
- Check out the PAWS Readme on Gitub for information on useful libraries and storage space.
- https://meta.wikimedia.org/wiki/Research:Data - This page is intended to help community members, developers, and researchers who are interested in analyzing raw data learn what data and infrastructure is available.
- https://meta.wikimedia.org/wiki/Data_dumps - Data dumps
Some notes to self
- Please document this import tool, which can be used to import notebooks into other notebooks:
from paws.YuviPanda.replicahelper import sql
- Learn more about replica databases: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database
- https://github.com/toolforge/paws (accessing replica databases using PANDAS)
- Keep an eye on these tickets:
- Note that in JMOs notebook, connecting to HOSTBOT is mentioned, but it appears the ticket was declined: https://phabricator.wikimedia.org/T123098
- Learn about the MediaWiki Action API: https://www.mediawiki.org/wiki/API:Main_page