You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Help:Toolforge/Pywikibot: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Shizhao
imported>Multichill
(more -core)
(18 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{Pywikibot outdated}}
{{Note|This page is related to https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Pywikibot_tool. These pages will be combined for simpler documentation.}}
{{Toolforge nav}}
{{Toolforge nav}}
{{caution}} This page may contain inaccuracies. It is currently being edited and redesigned for better readability. For further information, please see: https://phabricator.wikimedia.org/T134495
The '''[[mw:Manual:Pywikibot|Pywikibot Framework]]''' is a collection of Python tools that automate work on MediaWiki sites. Please confer [[mw:Manual:Pywikibot/Installation]] first.


A snapshot of the Pywikibot ‘core’ branch (formerly ‘rewrite’) is maintained at ‘/shared/pywikibot/core’. The ‘compat’ (formerly ‘trunk’) branch is maintained at ‘/shared/pywikipedia/compat,’ but because of the possibility of session cookie leaks, as well as the difficulty of using compat in a centralized way, we recommend that you [[#Setup pywikibot on Toolforge (locally) | install ‘compat’ locally]] if you need to use this. But be aware that compat is very outdated and no longer supported by the pywikibot developer team.
The '''[[mw:Manual:Pywikibot|Pywikibot Framework]]''' is a collection of Python tools that automate work on MediaWiki sites. Please review [[mw:Manual:Pywikibot/Installation]] first.


In general, we recommend using the shared ‘core’ files because the code is updated frequently. If you are a developer and/or would like to control when the code is updated, you may also choose to [[#Setup pywikibot on Toolforge (locally) | install 'core' locally]] in your tool directory.
The stable version of the Pywikibot 'core' branch (formerly 'rewrite') is accessible at <code>/shared/pywikibot/stable</code>. If you are a developer and/or would like to use the current master branch, this is accessible at <code>/shared/pywikibot/core</code> but be aware this might not be a stable release. To have control when the code is updated, you may also choose to [[#Setup pywikibot on Toolforge (locally)|install 'core' locally]] in your tool directory.


Note that the shared 'core' code consists only of the source files; each bot operator will need to create his or her own configuration files (such as ‘user-config.py’) and set up a PYTHONPATH and other environment variables. Please see [[#Using the shared Pywikibot files (recommended setup) | Using the shared Pywikibot files]] for more information.
Note that the shared 'core' code consists only of the source files; each bot operator will need to create their own configuration files (such as 'user-config.py') and set up a PYTHONPATH and other environment variables. Please see [[#Using the shared Pywikibot files (recommended setup)|Using the shared Pywikibot files]] for more information.


== Using the shared Pywikibot files (recommended setup) ==
== Using the shared Pywikibot files (recommended setup) ==
For most purposes, using the centralized ‘core’ files is recommended as the code is updated frequently. The shared files are available at <code>/data/project/shared/pywikibot/core</code>, and steps for configuring your tool account are provided below. The configuration files themselves are stored in your tool account in the <code>$HOME/.pywikibot</code> directory, or another directory, where they can be used via the -dir option (all of this is described in more detail in the instructions).
For most purposes, using the centralized 'core' files is recommended. The shared files are available at <code>/data/project/shared/pywikibot/stable</code>, and steps for configuring your tool account are provided below. The configuration files themselves are stored in your tool account in the <code>$HOME/.pywikibot</code> directory, or another directory, where they can be used via the <code>-dir</code> option (all of this is described in more detail in the instructions).


If you are a developer and/or would like to control when the code is updated, or if you would like to use the ‘compat’ branch instead of 'core' (not all the Pywikibot scripts have been ported to ‘core’), please see [[#Setup pywikibot on Toolforge (locally) | Installing Pywikibot locally]] for instructions.
If you are a developer and/or would like to control when the code is updated, please see [[#Setup pywikibot on Toolforge (locally) | Installing Pywikibot locally]] for instructions.


'''To set up your Tools account to use the shared ‘core’ framework:'''
'''To set up your Tools account to use the shared 'core' framework:'''


1. Become your tool-account
1. Become your tool-account
maintainer@tools-login:~$ become toolname
<syntaxhighlight lang="shell-session">
maintainer@tools-login:~$ become toolname
</syntaxhighlight>


2. In your home directory, create (or edit, if it exists already) a ‘.bash_profile’ file to include the following line. The path should be on one line, though it may appear to be on multiple lines depending on your screen width. When you save the .bash_profile file, your settings will be updated for all future shell sessions:
2. In your home directory, create (or edit, if it exists already) a '.bash_profile' file:


export PYTHONPATH=/data/project/shared/pywikibot/core:/shared/pywikibot/core/scripts
<syntaxhighlight lang="sh">
nano .bash_profile
</syntaxhighlight>


3. Import the path settings into your current session:
and include the following line:


tools.tool@tools-login$ source .bash_profile
<syntaxhighlight lang="sh">
export PYTHONPATH=/data/project/shared/pywikibot/stable:/data/project/shared/pywikibot/stable/scripts
</syntaxhighlight>


4. In your home directory, create a subdirectory named ‘.pywikibot’ (the .’ is important!) for bot-related files:
The path should be on one line, though it may appear to be on multiple lines depending on your screen width. When you save the .bash_profile file ({{key press|CTRL|X}}), your settings will be updated for all future shell sessions.


tools.tool@tools-login$ mkdir .pywikibot
3. Import the path settings into your current session:
<syntaxhighlight lang="shell-session">
tools.tool@tools-login$ source .bash_profile
</syntaxhighlight>


4. In your home directory, create a subdirectory named '.pywikibot' (the '.' is important!) for bot-related files:
<syntaxhighlight lang="shell-session">
tools.tool@tools-login$ mkdir $HOME/.pywikibot
</syntaxhighlight>
[[File:Python_-data-project-shared-pywikibot-core-generate_user_files-py.png|thumb|example of configuration for commons.wikimedia.org]]
[[File:Python_-data-project-shared-pywikibot-core-generate_user_files-py.png|thumb|example of configuration for commons.wikimedia.org]]
5. Configure Pywikibot.
5. Configure Pywikibot.


To create configuration files, use the following command and then follow the instructions. You may also use an existing configuration file (e.g., ‘user-config.py’) that works on another system by copying it into your .pywikibot directory:
To create configuration files, use the following command and then follow the instructions. You may also use an existing configuration file (e.g., 'user-config.py') that works on another system by copying it into your .pywikibot directory:
tools.tool@tools-login$ python /data/project/shared/pywikibot/core/generate_user_files.py
<syntaxhighlight lang="shell-session">
tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/generate_user_files.py
</syntaxhighlight>


6. Test out your setup.
6. Test out your setup.
In general, all jobs should be [[#Submitting, managing and scheduling jobs on the grid|run on the grid]], but it’s fine to test your setup on the command line:
In general, all jobs should be [[#Submitting, managing and scheduling jobs on the grid|run on the grid]], but it's fine to test your setup on the command line. You should see the following terminal output (or something similar):
tools.tool@tools-login$ python /data/project/shared/pywikibot/core/scripts/version.py
<syntaxhighlight lang="shell-session">
You should see the following terminal output (or something similar):
tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/version.py
Pywikibot [http] branches/rewrite (r11526, 2013/05/12, 18:51:23, OUTDATED) Python 2.7.3 (default, Aug  1 2012, 05:14:39) [GCC 4.6.3] unicode test: ok
Pywikibot: [https] r-pywikibot-core.git (1db1f28, g15095, 2021/05/31, 14:35:28, stable)
Note that you do not run scripts using pwb.py, but run scripts directly, e.g., <code>python /data/project/shared/pywikibot/core/scripts/version.py</code>. Setting <tt>PYTHONPATH</tt> means that you no longer need pwb.py to make, say, <code>import pywikibot</code> work.
Release version: 6.3.0
requests version: 2.12.4
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
</syntaxhighlight>
 
Note that you do not need to run scripts using pwb.py, but run scripts directly, e.g., <code>python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/version.py</code>. Setting <tt>PYTHONPATH</tt> means that you no longer need the pwb.py helper script to make, say, <code>import pywikibot</code> work. Anyway the pwb.py helper script has additional advantages like ignoring typing mistakes for script names, script path redirection, dependency checks, see [https://doc.wikimedia.org/pywikibot/master/utilities/index.html?#module-pwb pwb script documentation].
 
If you need to use multiple user-config.py files, you can do so by adding <tt>-dir:<path where you want your user-config.py></tt> to every python command. To use the local directory, use <tt>-dir:.</tt> (colon dot).


If you need to use multiple user-config.py files, you can do so by adding -dir:<path where you want your user-config.py> to every python command. To use the local directory, use -dir:. (colon dot).
For more information about Pywikibot, please see the [[mw:Manual:Pywikibot|Pywikibot documentation]]. The pywikibot mailing list (pywikibot{{@}}lists.wikimedia.org) and IRC channel ({{irc|pywikibot}}) are good places to go for additional help. Other useful information about using the centralized 'core' files is available here: [[User:Russell Blau/Using pywikibot on Labs]]


For more information about Pywikibot, please see the [[mw:Pywikibot|Pywikibot documentation]]. The pywikibot mailing list (pywikibot@lists.wikimedia.org) and IRC (irc://irc.freenode.net/pywikibot) channel are good places to go for additional help. Other useful information about using the centralized 'core' files is available here: [[User:Russell Blau/Using pywikibot on Labs]]
{{caution}} Script path for Pywikibot framework utility scripts (<tt>generate_family_file.py</tt>, <tt>generate_user_files.py</tt>, <tt>shell.py</tt>, <tt>version.py</tt>) has been changed in core (master) branch with release 7.0.0. To use them the path is <code>/data/project/shared/pywikibot/core/pywikibot/scripts/<script_name></code> or it can be invoked by the <tt>pwb.py</tt> wrapper script. See also: https://doc.wikimedia.org/pywikibot/master/utilities/index.html


== Setup pywikibot on Toolforge (locally) ==
== Setup pywikibot on Toolforge (locally) ==
{{anchor|Setup pywikibot on Labs (locally)}}
{{anchor|Setup pywikibot on Labs (locally)}}
If you want to use the compat branch, we highly recommend installing it locally (it's almost impossible to use the shared files correctly and, if you try, you might leak session cookies to a location where anyone can read them, you might need additional libraries, etc.). For core, you can also install the files locally -- this would allow you to upgrade whenever it suits you, instead of always running the latest version.
Installing pywikibot local to your tool allows you to upgrade whenever it suits you, instead of always running the latest version.
 
=== Installing core ===
Similar to the instructions given in [http://lists.wikimedia.org/pipermail/pywikibot/2013-August/008168.html this mail] do:


=== {{Anchor|Installing core}} Clone pywikibot git repo ===
Clone the 'core' git repository:
Clone the 'core' git repository:
$ git clone --recursive --branch stable "https://gerrit.wikimedia.org/r/pywikibot/core" pywikibot-core
<syntaxhighlight lang="shell-session">
$ cd pywikibot-core
$ git clone --recursive --branch stable "https://gerrit.wikimedia.org/r/pywikibot/core" $HOME/pywikibot
 
</syntaxhighlight>
then you can compress the git repository by running
$ git gc --aggressive --prune
$ cd scripts/i18n/
$ git gc --aggressive --prune
 
which results in a repo of size ~9MB.
 
You have 2 choices on how you want to proceed now and setup core. You can do so by using an additional tool called <code>virtualenv</code> and install it as module into a virtual environment, or you can run it from sources - similiar like compat - by using the integrated <code>pwb.py</code> wrapper. For the second method no installation is needed.
 
{{anchor|virtualenv}}
;install as module - virtualenv
 
If you would like to ''install a local version'' of the 'core' branch, we recommend that you use virtualenv, which is particularly useful if your code uses a lot of externals (e.g. IRC bots, image handling bots, etc.).
 
To set up the Pywikibot core branch from cloned repo:
 
Create a virtualenv. You can call it whatever you'd like (e.g., 'pwb', in this example); shorter names are easier:
$ virtualenv pwb


This will install Python v2.7. To install the version 3:
=== Setup a Python virtual environment for library dependencies ===
$ virtualenv -p /usr/bin/python3 pwb
{{Anchor|virtualenv}}{{Anchor|install as module - virtualenv}}
When using a local pywikibot install, use a [https://docs.python.org/3/tutorial/venv.html Python virtual environment] (venv) to manage Python library dependencies. The Toolforge environment does provide system packages for many Python libraries, but these are installed using Debian packages which means that they are often older versions and not likely to be upgraded often.


Activate it
Create a venv. You can give this venv any name you would like. We will use 'pwb' in this example.
$ source ~/pwb/bin/activate
<syntaxhighlight lang="shell-session">
and then do the following, which basically installs pwb-core as a symlink. This way, if you modify the directory, you don't need to install it again. This will also call python generate_user_files.py:
$ python3 -m venv $HOME/pwb
$ cd pywikibot-core
</syntaxhighlight>
$ python setup.py develop


To use the code from outside the virtual environment (e.g. to submit jobs to the grid engine), use:
Once you have created the venv, you can "activate" it to setup your shell's $PATH so that the <code>python3</code> and <code>pip3</code> binaries in the virtual environment are used by default.
$ /data/project/tooluser/pwb/bin/python /data/project/tooluser/path/to/script.py
<syntaxhighlight lang="shell-session">
or
$ source $HOME/pwb/bin/activate
$ $HOME/pwb/bin/python /home/path/to/script.py
(pwb) $
Note: If you want to run a script in interactive mode to debug, you'll need to run <code>source ~/pwb/bin/activate</code> first.
</syntaxhighlight>


; run from sources - pwb.py wrapper
Now that the venv is created and active for your current shell session, we can install the pywikibot code from the git clone we made earlier into this venv. This basically installs the pywikibot core code as a symlink in the venv. This way, if you modify the directory, you don't need to install it again.
After cd'ing into pywikibot-core, run
<syntaxhighlight lang="shell-session">
(pwb) $ pip3 install --upgrade pip "setuptools>=49.4.0, !=50.0.0, <50.2.0" wheel
...
Successfully installed pip-21.2.4 setuptools-58.1.0 wheel-0.37.0
(pwb) $ cd $HOME/pywikibot
(pwb) $ pip3 install -e .[mwparserfromhell,mwoauth,mysql]  # adjust extra dependencies as needed for your tool
...
Finished processing dependencies for pywikibot==6.6.1
</syntaxhighlight>


$ python pwb.py login.py
Note: the <code>setuptools!=50.0.0</code> install constraint is for [[phab:T261748|T261748]] and the [https://github.com/pypa/setuptools/issues/2352 upstream issue in setuptools] related to relative imports.


which will ask a series of questions on how you want to configure your local copy. This will generate the required config files for you. Alternatively, if you have already config file from previous version, you can copy those existing config files into the pywikibot-core directory.
=== Using the virtual environment without activating it ===
To use the code from outside the virtual environment (for example to submit jobs to the grid engine), use the full paths to the <code>python3</code> inside your venv directory and the full path to the script you want to run:
<syntaxhighlight lang="shell-session">
$ $HOME/pwb/bin/python3 $HOME/path/to/script.py
</syntaxhighlight>


Some bot scripts require extra packages to be installed -- see the file externals/README for more details.
=== Using the virtual environment on Kubernetes ===
The way to launch and customise the virtual environment is different on [[News/Toolforge Stretch deprecation|Kubernetes]].  


=== compat ===
The virtual environment should be defined in the toolforge-job itself. Create a script similar to this:


Clone the 'compat' git repository:
{{Codesample|name=pwb_venv.sh|lang=bash|scheme=light|code=
#!/bin/bash


$ git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/compat.git pywikibot-compat
# create the venv
python3 -m venv pwbvenv


Now you have to setup pywikibot, by running <code>login.py</code> (in fact running any bot script – like e.g. your favourite one – works):
# activate it
source pwbvenv/bin/activate


$ cd pywikibot-compat
# install some packages
$ python login.py -all
pip3 install --upgrade pip "setuptools>=49.4.0, !=50.0.0, <50.2.0" wheel
cd $HOME/pywikibot
pip3 install -e .[mwparserfromhell,mwoauth,mysql]
}}


roughly as described in the [[#Installing core|Installing core]] section above.
<syntaxhighlight lang="shell-session">
 
tools.mytool@tools-sgebastion-11:~$ chmod ug+x pwb_venv.sh
You may setup all externals manually if you want - but this is not needed in ''compat''; see [[mw:Manual:Pywikibot/Installation#Dependencies]] for further info. If you do not install them, you may be asked to install some [[Help:Toolforge/FAQ#My Tool requires a package that is not currently installed in Toolforge. How can I add it?|extra packages]] depending on what scripts you run.
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run pwb-venv --command "./pwb_venv.sh" --image tf-python39 --wait
 
INFO: job 'pwb-venv' completed
You will also have to enter the password for your bot eventually.
</syntaxhighlight>
 
Now you have finished the configuration of ''compat'' and can continue setting up the webspace and jobs to execute.
 
== Setup web-space ==
 
If you want to provide data for download, you need to start a webservice; see the [[#Web services|section "Web services"]] for how to do that.
 
Per default, the directory listing on http://tools.wmflabs.org/TOOLNAME is disabled. See the [[#Example configurations|example]] under "Directory or file index" for how to enable that.
 
If you run a bot with the <code>-log</code> option, you will find the log files within the <tt>logs/</tt> directory. If you want to allow users to access it from the web, do
$ cd ~/public_html
$ mkdir logs
$ cd logs
$ ln -s ~/pywikibot-core/logs core
If you want a specific file type to be handled different by your browser, e.g. <tt>.log</tt> files like text files, see the example under "Header, mimetype, error handler" for how to configure that and (don't forget to) clear your browser's cache afterwards.
 
Next you might want to consider your <tt>cgi-bin</tt> directory:
$ cd ~/cgi-bin
follow the hints given at [[Nova Resource:Tools/Help#Logs]] ''exactly'', e.g. even the two commands
$ /usr/bin/python      # valid
$ /usr/bin/env python  # in-valid
work and do the same in shell, only the first one is valid and works here, the second is invalid! Another point to mention is that [[User:Magnus_Manske/Migrating_from_toolserver#PHP|PHP scripts go into public_html, not cgi-bin]]. Python scripts on the other hand [[Nova_Resource:Tools/Help#Published_directories|can be placed in public_html or cgi-bin]] as you wish. I would recommend to use <tt>public_html</tt> for documents and keep it listable, whereas <tt>cgi-bin</tt> should be used for CGI scripts and be protected (not listable).


== Setup job submission ==
== Setup job submission ==
After installing, you can run your bot directly via a shell command, though this is highly discouraged. You should use the grid to run jobs instead.
After installing, you can run your bot directly via a shell command, though this is highly discouraged. You should use the grid to run jobs instead.


In order to setup the submission of the jobs you want to execute and use the grid engine you should first consider [[Nova Resource:Tools/Help#Submitting, managing and scheduling jobs on the grid]] and if you are familiar with the Toolserver and its architecture consult [[User:Magnus_Manske/Migrating from toolserver#qsub et al|Migrating from toolserver]] also.
In order to setup the submission of the jobs you want to execute and use the grid engine you should first read [[Help:Toolforge/Grid]].
 
In general Toolforge uses SGE and its commands like <tt>qsub</tt> et al, this is explained [[Nova Resource:Tools/Help#Submitting, managing and scheduling jobs on the grid|in this document]] which you should use in order to get an idea which command and what parameters you want to use. Please don't use the <code>-daemonize</code> parameter as it is unneeded on the grid.


To run a bot using the grid, you might want to be in the pywikibot directory (this is not needed) - which means you have to write a small wrapper script. The following example script (versiontest.sh) is used to run version.py:
To run a bot using the grid, you might want to be in the pywikibot directory (this is not needed) - which means you have to write a small wrapper script. The following example script (versiontest.sh) is used to run version.py:


$ cat versiontest.sh
<syntaxhighlight lang="shell-session">
$ cat versiontest.sh
#!/bin/bash
#!/bin/bash
cd /path/to/pywikibot
cd /data/project/shared/pywikibot/stable
python version.py
python3 version.py
</syntaxhighlight>


To submit a job, set the permissions for the script and then use the 'jsub' command to send the job to the grid:
To submit a job, set the permissions for the script and then use the 'jsub' command to send the job to the grid:
$ chmod 755 versiontest.sh
<syntaxhighlight lang="shell-session">
$ jsub -N job_name versiontest.sh
$ chmod 0755 versiontest.sh
$ jsub versiontest.sh
</syntaxhighlight>


Job output will be written to output and error files in your home directory called YOURJOBNAME.out and YOURJOBNAME.err, respectively (e.g., versiontest.out and versiontest.err in this example):
Job output will be written to output and error files in your home directory called YOURJOBNAME.out and YOURJOBNAME.err, respectively (versiontest.out and versiontest.err in this example):
 
<syntaxhighlight lang="shell-session">
$ cat ~/versiontest.out
$ cat ~/versiontest.out
pywikibot [https] r/pywikibot/compat (r10211, 8fe6bdc, 2013/08/18, 14:00:57, ok)
pywikibot [https] r/pywikibot/compat (r10211, 8fe6bdc, 2013/08/18, 14:00:57, ok)
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3]
[GCC 4.6.3]
config-settings:
config-settings:
use_api = True
use_api = True
use_api_login = True
use_api_login = True
unicode test: ok
unicode test: ok
</syntaxhighlight>


=== Example ===
=== Example ===
An [[Help:Toolforge/Grid#Submitting continuous jobs (such as bots) with 'jstart'|infinitely running job]] (e.g. irc-bot) like this (<tt>cronie</tt> entry from TS submit host):
An [[Help:Toolforge/Grid#Submitting continuous jobs (such as bots) with 'jstart'|infinitely running job]] such as an irc-bot can be started like this:


06 0 * * * qcronsub -l h_rt=INFINITY -l virtual_free=200M -l arch=lx -N script_wui $HOME/rewrite/pwb.py script_wui.py -log
<syntaxhighlight lang="shell-session">
 
$ jsub -once -continuous -l h_vmem=256M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log
becomes
</syntaxhighlight>
or shorter
<syntaxhighlight lang="shell-session">
$ jstart -l h_vmem=256M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log
</syntaxhighlight>


$ jsub -once -continuous -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
If you experience problems with your jobs, like e.g.
or shorter
$ jstart -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
the first expression is good for debugging. Memory values smaller than 256MB seam not to work here, since that is the minimum. If you experience problems with your jobs, like e.g.
  Fatal Python error: Couldn't create autoTLSkey mapping
  Fatal Python error: Couldn't create autoTLSkey mapping
you can try increasing the memory value - which is also needed here, because this script uses a second thread for timing and this thread needs memory too. Therefore use finally
you can try increasing the memory value:
$ jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
<syntaxhighlight lang="shell-session">
$ jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log
</syntaxhighlight>


Now in order to create a crontab follow [[Help:Toolforge/Grid#Scheduling jobs at regular intervals with cron|Scheduling jobs at regular intervals with cron]] and setup for crontab file like:
Now in order to create a crontab follow [[Help:Toolforge/Grid#Scheduling jobs at regular intervals with cron|scheduling jobs at regular intervals with cron]] and setup for crontab file like:
$ crontab -e
<syntaxhighlight lang="shell-session">
$ crontab -e
</syntaxhighlight>
and enter
and enter
PATH=/usr/local/bin:/usr/bin:/bin
<syntaxhighlight lang="text">
PATH=/usr/local/bin:/usr/bin:/bin
 
# Run script_wui.py at 00:17 UTC each day
17 0 * * * jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log
</syntaxhighlight>
 
=== Kubernetes ===
The system of job creation on Kubernetes is different. First, the virtual environment [[Help:Toolforge/Pywikibot#Using_the_virtual_environment_on_Kubernetes|needs to be customised]].
 
After that, the job could be launched:
<syntaxhighlight lang="shell-session">
$ toolforge-jobs run script_name --command "$HOME/pwbvenv/bin/python3 $HOME/pywikibot/pwb.py script_name -start:!" --image tf-python39
</syntaxhighlight>


06 0 * * * jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
Additional parameters for the job could be reviewed on [[Help:Toolforge/Jobs framework]] and could include, for example, additional memory allocation (<code>--mem MEM</code>), job restart after being finished (<code>--continuous</code>), etc.


== Using pip ==
== Using pip ==
The [https://en.wikipedia.org/wiki/Pip_(package_manager) pip] package manager is not installed on the Toolforge servers, but it can be used through the use of virtual environments. The first step is to create a virtual environment, and get the latest version of <code>pip</code> installed in it:
The [[:en:Pip (package manager)|pip]] package manager is not installed for global use on the Toolforge servers, but it can be used through the use of virtual environments. The first step is to create a virtual environment, and get the latest version of <code>pip</code> installed in it:


<source lang="bash">
<syntaxhighlight lang="shell-session">
$ virtualenv -p python3 venv
$ python3 -m venv venv
$ source venv/bin/activate
$ source venv/bin/activate
$ pip install --upgrade pip
$ pip3 install --upgrade pip
</source>
</syntaxhighlight>


Installing specific packages from <code>pip</code> is as simple as loading the environment and then running the <code>pip install</code> command, for example:
Installing specific packages from <code>pip3</code> is as simple as loading the environment and then running the <code>pip3 install</code> command, for example:


<source lang="bash">
<syntaxhighlight lang="shell-session">
$ source venv/bin/activate
$ source venv/bin/activate
$ pip install PACKAGENAME
$ pip3 install PACKAGENAME
</source>
</syntaxhighlight>


Lastly, running a pywikibot script that depends on a <code>pip</code> package will also require loading the environment first, for instance:
Lastly, running a pywikibot script that depends on a <code>pip</code> package will also require loading the environment first, for instance:


<source lang="bash">
<syntaxhighlight lang="shell-session">
$ source venv/bin/activate
$ source venv/bin/activate
$ python foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"
$ python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"
</source>
</syntaxhighlight>


The venv does not get automatically activated in Grid job submissions. Two common workarounds are having wrapping shell scripts that activates the venv, or use absolute paths to the binaries within:
The venv does not get automatically activated in Grid job submissions. Two common workarounds are having wrapping shell scripts that activates the venv, or use absolute paths to the binaries within:


<source lang="bash">
<syntaxhighlight lang="shell-session">
$ jstart -N jobname venv/bin/python foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"
$ jstart -N jobname venv/bin/python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"
</source>
</syntaxhighlight>


[[Category:Toolforge|Pywikibot]]
[[Category:Toolforge|Pywikibot]]

Revision as of 10:48, 5 June 2022

Warning Caution: This page may contain inaccuracies. It is currently being edited and redesigned for better readability. For further information, please see T134495.

The Pywikibot Framework is a collection of Python tools that automate work on MediaWiki sites. Please review mw:Manual:Pywikibot/Installation first.

The stable version of the Pywikibot 'core' branch (formerly 'rewrite') is accessible at /shared/pywikibot/stable. If you are a developer and/or would like to use the current master branch, this is accessible at /shared/pywikibot/core but be aware this might not be a stable release. To have control when the code is updated, you may also choose to install 'core' locally in your tool directory.

Note that the shared 'core' code consists only of the source files; each bot operator will need to create their own configuration files (such as 'user-config.py') and set up a PYTHONPATH and other environment variables. Please see Using the shared Pywikibot files for more information.

Using the shared Pywikibot files (recommended setup)

For most purposes, using the centralized 'core' files is recommended. The shared files are available at /data/project/shared/pywikibot/stable, and steps for configuring your tool account are provided below. The configuration files themselves are stored in your tool account in the $HOME/.pywikibot directory, or another directory, where they can be used via the -dir option (all of this is described in more detail in the instructions).

If you are a developer and/or would like to control when the code is updated, please see Installing Pywikibot locally for instructions.

To set up your Tools account to use the shared 'core' framework:

1. Become your tool-account

maintainer@tools-login:~$ become toolname

2. In your home directory, create (or edit, if it exists already) a '.bash_profile' file:

nano .bash_profile

and include the following line:

export PYTHONPATH=/data/project/shared/pywikibot/stable:/data/project/shared/pywikibot/stable/scripts

The path should be on one line, though it may appear to be on multiple lines depending on your screen width. When you save the .bash_profile file (CTRL+X), your settings will be updated for all future shell sessions.

3. Import the path settings into your current session:

tools.tool@tools-login$ source .bash_profile

4. In your home directory, create a subdirectory named '.pywikibot' (the '.' is important!) for bot-related files:

tools.tool@tools-login$ mkdir $HOME/.pywikibot
example of configuration for commons.wikimedia.org

5. Configure Pywikibot.

To create configuration files, use the following command and then follow the instructions. You may also use an existing configuration file (e.g., 'user-config.py') that works on another system by copying it into your .pywikibot directory:

tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/generate_user_files.py

6. Test out your setup. In general, all jobs should be run on the grid, but it's fine to test your setup on the command line. You should see the following terminal output (or something similar):

tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/version.py
Pywikibot: [https] r-pywikibot-core.git (1db1f28, g15095, 2021/05/31, 14:35:28, stable)
Release version: 6.3.0
requests version: 2.12.4
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]

Note that you do not need to run scripts using pwb.py, but run scripts directly, e.g., python3 /data/project/shared/pywikibot/stable/pywikibot/scripts/version.py. Setting PYTHONPATH means that you no longer need the pwb.py helper script to make, say, import pywikibot work. Anyway the pwb.py helper script has additional advantages like ignoring typing mistakes for script names, script path redirection, dependency checks, see pwb script documentation.

If you need to use multiple user-config.py files, you can do so by adding -dir:<path where you want your user-config.py> to every python command. To use the local directory, use -dir:. (colon dot).

For more information about Pywikibot, please see the Pywikibot documentation. The pywikibot mailing list (pywikibot at lists.wikimedia.org) and IRC channel (#pywikibot connect) are good places to go for additional help. Other useful information about using the centralized 'core' files is available here: User:Russell Blau/Using pywikibot on Labs

Warning Caution: Script path for Pywikibot framework utility scripts (generate_family_file.py, generate_user_files.py, shell.py, version.py) has been changed in core (master) branch with release 7.0.0. To use them the path is /data/project/shared/pywikibot/core/pywikibot/scripts/<script_name> or it can be invoked by the pwb.py wrapper script. See also: https://doc.wikimedia.org/pywikibot/master/utilities/index.html

Setup pywikibot on Toolforge (locally)

Installing pywikibot local to your tool allows you to upgrade whenever it suits you, instead of always running the latest version.

Clone pywikibot git repo

Clone the 'core' git repository:

$ git clone --recursive --branch stable "https://gerrit.wikimedia.org/r/pywikibot/core" $HOME/pywikibot

Setup a Python virtual environment for library dependencies

When using a local pywikibot install, use a Python virtual environment (venv) to manage Python library dependencies. The Toolforge environment does provide system packages for many Python libraries, but these are installed using Debian packages which means that they are often older versions and not likely to be upgraded often.

Create a venv. You can give this venv any name you would like. We will use 'pwb' in this example.

$ python3 -m venv $HOME/pwb

Once you have created the venv, you can "activate" it to setup your shell's $PATH so that the python3 and pip3 binaries in the virtual environment are used by default.

$ source $HOME/pwb/bin/activate
(pwb) $

Now that the venv is created and active for your current shell session, we can install the pywikibot code from the git clone we made earlier into this venv. This basically installs the pywikibot core code as a symlink in the venv. This way, if you modify the directory, you don't need to install it again.

(pwb) $ pip3 install --upgrade pip "setuptools>=49.4.0, !=50.0.0, <50.2.0" wheel
...
Successfully installed pip-21.2.4 setuptools-58.1.0 wheel-0.37.0
(pwb) $ cd $HOME/pywikibot
(pwb) $ pip3 install -e .[mwparserfromhell,mwoauth,mysql]  # adjust extra dependencies as needed for your tool
...
Finished processing dependencies for pywikibot==6.6.1

Note: the setuptools!=50.0.0 install constraint is for T261748 and the upstream issue in setuptools related to relative imports.

Using the virtual environment without activating it

To use the code from outside the virtual environment (for example to submit jobs to the grid engine), use the full paths to the python3 inside your venv directory and the full path to the script you want to run:

$ $HOME/pwb/bin/python3 $HOME/path/to/script.py

Using the virtual environment on Kubernetes

The way to launch and customise the virtual environment is different on Kubernetes.

The virtual environment should be defined in the toolforge-job itself. Create a script similar to this:

pwb_venv.sh
#!/bin/bash

# create the venv
python3 -m venv pwbvenv

# activate it
source pwbvenv/bin/activate

# install some packages
pip3 install --upgrade pip "setuptools>=49.4.0, !=50.0.0, <50.2.0" wheel
cd $HOME/pywikibot
pip3 install -e .[mwparserfromhell,mwoauth,mysql]
tools.mytool@tools-sgebastion-11:~$ chmod ug+x pwb_venv.sh
tools.mytool@tools-sgebastion-11:~$ toolforge-jobs run pwb-venv --command "./pwb_venv.sh" --image tf-python39 --wait
INFO: job 'pwb-venv' completed

Setup job submission

After installing, you can run your bot directly via a shell command, though this is highly discouraged. You should use the grid to run jobs instead.

In order to setup the submission of the jobs you want to execute and use the grid engine you should first read Help:Toolforge/Grid.

To run a bot using the grid, you might want to be in the pywikibot directory (this is not needed) - which means you have to write a small wrapper script. The following example script (versiontest.sh) is used to run version.py:

$ cat versiontest.sh
#!/bin/bash
cd /data/project/shared/pywikibot/stable
python3 version.py

To submit a job, set the permissions for the script and then use the 'jsub' command to send the job to the grid:

$ chmod 0755 versiontest.sh
$ jsub versiontest.sh

Job output will be written to output and error files in your home directory called YOURJOBNAME.out and YOURJOBNAME.err, respectively (versiontest.out and versiontest.err in this example):

$ cat ~/versiontest.out
pywikibot [https] r/pywikibot/compat (r10211, 8fe6bdc, 2013/08/18, 14:00:57, ok)
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3]
config-settings:
use_api = True
use_api_login = True
unicode test: ok

Example

An infinitely running job such as an irc-bot can be started like this:

$ jsub -once -continuous -l h_vmem=256M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log

or shorter

$ jstart -l h_vmem=256M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log

If you experience problems with your jobs, like e.g.

Fatal Python error: Couldn't create autoTLSkey mapping

you can try increasing the memory value:

$ jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log

Now in order to create a crontab follow scheduling jobs at regular intervals with cron and setup for crontab file like:

$ crontab -e

and enter

PATH=/usr/local/bin:/usr/bin:/bin

# Run script_wui.py at 00:17 UTC each day
17 0 * * * jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot/pwb.py script_wui.py -log

Kubernetes

The system of job creation on Kubernetes is different. First, the virtual environment needs to be customised.

After that, the job could be launched:

$ toolforge-jobs run script_name --command "$HOME/pwbvenv/bin/python3 $HOME/pywikibot/pwb.py script_name -start:!" --image tf-python39

Additional parameters for the job could be reviewed on Help:Toolforge/Jobs framework and could include, for example, additional memory allocation (--mem MEM), job restart after being finished (--continuous), etc.

Using pip

The pip package manager is not installed for global use on the Toolforge servers, but it can be used through the use of virtual environments. The first step is to create a virtual environment, and get the latest version of pip installed in it:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install --upgrade pip

Installing specific packages from pip3 is as simple as loading the environment and then running the pip3 install command, for example:

$ source venv/bin/activate
$ pip3 install PACKAGENAME

Lastly, running a pywikibot script that depends on a pip package will also require loading the environment first, for instance:

$ source venv/bin/activate
$ python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"

The venv does not get automatically activated in Grid job submissions. Two common workarounds are having wrapping shell scripts that activates the venv, or use absolute paths to the binaries within:

$ jstart -N jobname venv/bin/python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"