You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This is a staging area for some changes that will be made to the Toolforge:Help page.
- I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what is possible to do with Toolforge
- I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what Toolforge is and how to use it to create a tool
- I am an experienced developer, and I want to onboard experienced developers who are working on or with Toolforge
- I am an experienced developer, and I want to share information about how to perform a task or complete a process with a less experienced developer
- I am an experienced developer, and I want to find information about how Toolforge works
Sections which may become stand alone pages
- Each need an appropriate title
Managing files in Toolforge
Using Toolforge and managing your files
Toolforge can be accessed in a variety of ways – from its public IP to a GUI client. Please see Help:Access for general information about accessing Cloud VPS projects.
After you can ssh successfully, you can transfer files via sftp and scp. Note that the transferred files will be owned by you. You will likely wish to transfer ownership to your tool account. To do this:
become your tool account:
yourshellaccountname@tools-login:~$ become toolaccount tools.toolaccount@tools-login:~$
2. As your tool account,
take ownership of the files:
tools.toolaccount@tools-login:~$ take FILE
take command will change the ownership of the file(s) and directories recursively to the calling user (in this case, the tool account).
if you're getting permission errors, note that you can also transfer files the other way around: copy the files as your tool account to
Another, probably easier, way is to set the permission to group-writable for the tools directory. For example, if your shell account's name is
alice and your tool name is
alicetools you could do something like this after logged in as a shell user
become alicetools chmod -R g+w /data/project/alicetools logout cp -rv /home/alice/* /data/project/alicetools/
What gets backed up?
The basic rule is: there is a lot of redundancy, but no user-accessible backups. Toolforge users should make certain that they use source control to preserve their code, and make regular backups of irreplaceable data. With luck, some files may be recoverable by Cloud Services administrators in a manual process. But this requires human intervention and will likely not rescue the file that was created five minutes ago and deleted two minutes ago. If necessary, ask on IRC or file a Phabricator task.
Repositories / Version control
Setting up code review and version control
Although it's possible to just stick your code in the directory and mess with it manually every time you want to change something, your future self and your future collaborators will thank you if you instead use source control, a.k.a. version control and a code review tool. Wikimedia Cloud VPS makes it pretty easy to use Git for source control and Gerrit for code review, but you also have other options.
The best option is to create a Git repository to which project participants commit files. To access the files, become the tool account, check that repository out in your tool's directory, and thereafter run a regular
git pull whenever you want to deploy new files.
Putty and WinSCP
Note that instructions for accessing Toolforge with Putty and WinSCP differ from the instructions for using them with other Cloud VPS projects. Please see Help:Access to Toolforge instances with PuTTY and WinSCP for information specific to Toolforge.
Other graphical file managers (e.g., Gnome/KDE)
For information about using a graphical file manager (e.g., Gnome/KDE), please see Accessing instances with a graphical file manager.
- Go to toolsadmin
- Find your tool
- Click the create new repository button
Requesting a Gerrit/Git repository for your tool
Toolforge users may request a Gerrit/Git repository for their tools. Access to Git is managed via Wikimedia Cloud VPS and integrated with Gerrit, a code review system.
In order to use the Wikimedia Cloud VPS code review and version control, you must upload your ssh key to Gerrit and then request a repository for your tool.
- Log in to https://gerrit.wikimedia.org/ with your Wikimedia developer account username and password.
- Add your SSH public key (select “Settings” from the drop-down menu beside your user name in the upper right corner of the screen, and then “SSH Public Keys” from the Settings menu).
- Request a Gerrit project for your tool: Gerrit/New repositories
For more information, please see:
- Gerrit/New repositories -- request a repository
- Git/New repositories/Requests -- a list of existing requests, as well as a place to make new ones. You can see the status of your request as well.
For more information about using Git and Gerrit in general, please see Git/Gerrit.
Setting up a local Git repository
It is fairly simple to set up a local Git repository to keep versioned backups of your code. However, if your tool directory is deleted for some reason, your local repository will be deleted as well. You may wish to request a Gerrit/Git repository to safely store your backups and/or to share your code more easily. Other backup/versioning solutions are also available. See User:Magnus Manske/Migrating from toolserver § GIT for some ideas.
To create a local Git repository:
1. Create an empty Git repository
maintainer@tools-login:~$ git init
2. Add the files you would like to backup. For example:
maintainer@tools-login:~$ git add public_html
3. Commit the added files
git commit -m 'Initial check-in'
For more information about using Git, please see the git documentation.
Enabling simple public HTTP access to local Git repository
If you've set up a local Git repository like the above in your tool directory, you can easily set up public read access to the repository through HTTP. This will allow you to, for instance, clone the Git repository to your own home computer without using an intermediary service such as GitHub.
First create the
www/static/ subdirectory in your tool's home directory, if it does not already exist:
mkdir ~/www mkdir ~/www/static/
Now go to the
www/static/ directory, and make a symbolic link to your bare Git repository (the hidden
.git subdirectory in the root of your repository):
cd ~/www/static/ ln -s ~/.git yourtool.git
Now change directory into the symbolic link you just created, and run the
git update-server-info command to generate some auxiliary info files needed for the HTTP connectivity:
cd yourtool.git git update-server-info
Enable a few Git hooks for updating said auxiliary info files every time someone commits, rewrites or pushes to the repository:
ln -s hooks/post-update.sample hooks/post-commit ln -s hooks/post-update.sample hooks/post-rewrite ln -s hooks/post-update.sample hooks/post-update chmod a+x hooks/post-update.sample
You're done. You should now be able to clone the repository from any remote machine by running the command:
git clone http://tools-static.wmflabs.org/yourtool/yourtool.git
Using Github or other external service
Before you start you might want to setup your Git user account.
# Login to your tool account become mytool # Your name git config user.name "Your Name" # Your e-mail (use the one you set up in Github) git config user.email "firstname.lastname@example.org"
Then you can clone remote repo (as you always do):
git clone https://github.com/yourGithubName/yourGithubRepoName.git
You can do updates any way you want, but you might want to use this simple update script to securely update code:
#!/bin/bash read -r -p "Stop the service and pull fresh code? (Y/n)" response if ! [[ $response =~ ^([nN][oO]|[nN])$ ]] then webservice stop cd ./public_html echo -e "\nUpdating the code..." git pull echo read -r -p "OK to start the service? (Y/n)" response if ! [[ $response =~ ^([nN][oO]|[nN])$ ]] then webservice start fi fi
Save above in your tool account home folder as e.g. "update.sh". Don't forget to add executive rights to you and your tool group (i.e. `chmod 770 update.sh`).
MediaWiki Core integrations
Installing MediaWiki core
You want to install MediaWiki core and make your installation visible on the web.
One-time steps per tool
First, you have to do some preparatory steps which you need only once per tool.
If you have not installed composer yet:
mkdir ~/bin curl -sS https://getcomposer.org/installer | php -- --install-dir=$HOME/bin --filename=composer
If your local
bin directory it not in your
echo $PATH to find out), then create or alter the file
~/.profile and add the lines:
# set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi
Finish your session as <YOURTOOL> and start a new one, or:
Now you are done with the one-time preparations.
For each instance of core
The following steps are needed for each new installation of MediaWiki. We assume that you want to access MediaWiki via the web in a directory named
MW — you are free to use another name. If not already done:
If you plan to submit changes:
git clone ssh://<YOURUSERNAME>@gerrit.wikimedia.org:29418/mediawiki/core.git MW
or else, if you only want to use MediaWiki without submitting changes:
git clone https://gerrit.wikimedia.org/r/mediawiki/core.git MW
will do and spares resources. Next, recent versions of MediaWiki have external dependencies, so you need to install those:
cd MW composer install git review -s
webservice start and then you should be able to access the initial pre-install screen of MediaWiki from your web browser as:
and proceed as usual. See how to create new databases for your MediaWiki installations.
Mail to users
Mail sent to
email@example.com (where user is a shell account) will be forwarded to the email address that user has set in their Wikitech preferences, if it has been verified (the same as the 'Email this user' function on wikitech).
Any existing .forward in the user's home will be ignored.
Mail to a Tool
Mail can also be sent "to a tool" with:
Where "anything" is an arbitrary alphanumeric string. Mail will be forwarded to the first of:
- The email(s) listed in the tool's
~/.forward.anything, if present;
- The email(s) listed in the tool's
~/.forward, if present; or
- The wikitech email of the tool's individual maintainers.
firstname.lastname@example.org is an alias pointing to
email@example.com mostly useful for automated email generating from within Cloud VPS.
~/.forward.anything need to be readable by the user
Debian-exim; to achieve that, you probably need to
chmod o+r ~/.forward*.
Mail from Tools
From the Grid
When sending mail from a job, the usual command line method of piping the message body to
/usr/bin/mail may not work correctly because /usr/bin/mail attempts to deliver the message to the local MSA in a background process which will be killed if it is still running when the job exits.
If piping to a subprocess to send mail is needed, the message including headers may be piped to
/usr/sbin/exim -odf -i.
# This does not work when submitted as a job echo "Test message" | /usr/bin/mail -s "Test message subject" firstname.lastname@example.org # This does echo -e "Subject: Test message subject\n\nTest message" | /usr/sbin/exim -odf -i email@example.com
-ein case your shell's internal
From within a container
To send mail from within a Kubernetes container, use the
mail.tools.wmflabs.org SMTP server.
Containers running on the Toolforge Kubernetes cluster do not install and configure a local mailer service like the exim service that is installed on grid engine nodes. Tools running in Kubernetes should instead send email using an external SMTP server. The
mail.tools.wmflabs.org service name should be usable for this. This service name is used as the public MX (mail exchange) host for inbound SMTP messages to the
tools.wmflabs.org domain and points to a server that can process both inbound and outbound email for Toolforge.
- Web pages for tools
Can I have a subdomain for my web service?
Sorry, not yet. This is still in discussion at phab:T125589. Currently, your web services are available under tools.wmflabs.org/<YOURTOOL>.
- This is a brief summary of the /Database documentation page.
Is there a GUI tool for database work?
Not in Toolforge, but you can run one locally on your computer (for example the MySQL Workbench http://dev.mysql.com/downloads/tools/workbench/). Here is how you connect to the database:
- >For the login: firstname.lastname@example.org
- >For the database, it depends on the exact one you want to use, of course - for example: enwiki.labs
How do I access the database replicas?
- You will find a tool accounts credentials for mariadb in the file
$HOME/replica.my.cnf. You need to specify this file and the server you want to connect to. Some examples:
mysql --defaults-file=~/replica.my.cnf -h enwiki.labsdb enwiki_p # <- for English Wikipedia mysql --defaults-file=~/replica.my.cnf -h dewiki.labsdb dewiki_p # <- for German Wikipedia mysql --defaults-file=~/replica.my.cnf -h wikidatawiki.labsdb wikidatawiki_p # <- for Wikidata mysql --defaults-file=~/replica.my.cnf -h commonswiki.labsdb commonswiki_p # <- for Commons
- You can create a symlink from replica.my.cnf to .my.cnf by running
ln -s replica.my.cnf .my.cnfand leave off the
mysql -h commonswiki.labsdb commonswiki_p # <- for Commons
- Alternatively, use the
sqlutility that provides convenient shortcuts:
sql enwiki # <- for English Wikipedia sql commonswiki # <- for Commons sql commons # <- for Commons (shortcut) sql wikidata # <- for Wikidata (shortcut)
- Best practices for Toolforge development
- This is a brief summary of the /Developing documentation page.
- This is a brief summary of the /Web documentation page.
- Web pages for tools
- This is a brief summary of the /Grid documentation page.
- Using Open Grid Engine to run jobs
- This is a brief summary of the /Elasticsearch documentation page.
Redis is a key-value store similar to memcache, but with more features. It can be easily used to do publish/subscribe between processes, and also maintain persistent queues. Stored values can be different data structures, such as hash tables, lists, queues, etc. Stored data persists across service restarts. For more information, please see the Wikipedia article on Redis.
A Redis instance that can be used by all tools is available on
tools-redis, on the standard port
6379. It has been allocated a maximum of 12G of memory, which should be enough for most usage. You can set limits for how long your data stays in Redis; otherwise it will be evicted when memory limits are exceeded. See the Redis documentation for a list of available
For quick & dirty debugging, you can connect directly to the Redis server with
nc -C tools-redis 6379 and execute commands (for example "INFO").
Redis has no access control mechanism, so other users can accidentally/intentionally overwrite and access the keys you set. Even if you are not worried about security, it is highly probable that multiple tools will try to use the same key (such as
lastupdated, etc). To prevent
this, it is highly recommended that you prefix all your keys with an application-specific, lengthy, randomly generated secret key.
You can very simply generate a good enough prefix by running the following command:
openssl rand -base64 32
PLEASE PREFIX YOUR KEYS! We have also disabled the redis commands that let users 'list' keys. This protection however should not be trusted to protect any secret data. Do not store plain text secrets or decryption keys in Redis for your own protection.
Can I use memcache?
There is no memcached on Toolforge. Please use Redis instead.
The 'tools' project has access to a directory storing the public Wikimedia datasets (i.e. the dumps generated by Wikimedia). The most recent two dumps can be found in:
This directory is read-only, but you can copy files to your tool's home directory and manipulate them in whatever way you like.
If you need access to older dumps, you must manually download them from the Wikimedia downloads server.
/public/dumps/pagecounts-raw contains some years of the pagecount/projectcount data derived by Erik Zachte from Domas Mituzas' archives.
CatGraph (aka Graphserv/Graphcore)
CatGraph is a custom graph database that provides tool developers fast access to the Wikipedia category structure. For more information, please see the documentation.
It is possible to run a celery worker in a kubernetes container as a continuous job (for instance to execute long-running tasks triggered by a web frontend). The redis service can be used as a broker between the worker and the web frontend. Make sure you use your own queue name so that your tasks get sent to the right workers.
Phabricator and task tracking
Make sure there is a place for people to find out how to use this.