Difference between revisions of "Help:Toolforge/Kubernetes"

From Wikitech-static
Jump to navigation Jump to search
imported>BryanDavis
imported>Ladsgroup
m (Reverted edits by Shafiul2s (talk) to last revision by Quiddity)
(11 intermediate revisions by 10 users not shown)
Line 25: Line 25:


{{Collapse top|/data/project/wikiloveslove/cronjobs.yaml (copied 2020-02-01)}}
{{Collapse top|/data/project/wikiloveslove/cronjobs.yaml (copied 2020-02-01)}}
<source lang="yaml">
<syntaxhighlight lang="yaml">
---
---
apiVersion: batch/v1beta1
apiVersion: batch/v1beta1
Line 58: Line 58:
               value: /data/project/wikiloveslove
               value: /data/project/wikiloveslove
           restartPolicy: OnFailure
           restartPolicy: OnFailure
</source>
</syntaxhighlight>
{{Collapse bottom}}
{{Collapse bottom}}


Line 68: Line 68:


After creating the cronjob you can create a test job with <code>kubectl create job --from=cronjob/CRONJOB-NAME test</code> to immediately trigger the cronjob and then access the logs as usual with <code>kubectl logs job/test -f</code> to debug.
After creating the cronjob you can create a test job with <code>kubectl create job --from=cronjob/CRONJOB-NAME test</code> to immediately trigger the cronjob and then access the logs as usual with <code>kubectl logs job/test -f</code> to debug.
If that doesn't give you any useful output, try <code>kubectl describe job/test</code> to see what's going on: it might be a [https://phabricator.wikimedia.org/P13646 misconfigured limit], for instance.


==Kubernetes continuous jobs==
==Kubernetes continuous jobs==
Line 77: Line 79:


{{Collapse top|/data/project/stashbot/etc/deployment.yaml (copied 2020-01-03)}}
{{Collapse top|/data/project/stashbot/etc/deployment.yaml (copied 2020-01-03)}}
<source lang="yaml">
<syntaxhighlight lang="yaml">
---
---
# NOTE: this deployment works with the "toolforge" Kubernetes cluster, and not the legacy "default" cluster.
# NOTE: this deployment works with the "toolforge" Kubernetes cluster, and not the legacy "default" cluster.
Line 110: Line 112:
               value: /data/project/stashbot
               value: /data/project/stashbot
           imagePullPolicy: Always
           imagePullPolicy: Always
</source>
</syntaxhighlight>
{{Collapse bottom}}
{{Collapse bottom}}


Line 117: Line 119:
* Uses the 'tool-stashbot' namespace that the tool is authorized to control
* Uses the 'tool-stashbot' namespace that the tool is authorized to control
* Creates a container using the 'latest' version of the 'docker-registry.tools.wmflabs.org/[[phab:diffusion/ODIT/browse/master/python37-sssd/base/Dockerfile.template|toolforge-python37-sssd-base]]' Docker image.
* Creates a container using the 'latest' version of the 'docker-registry.tools.wmflabs.org/[[phab:diffusion/ODIT/browse/master/python37-sssd/base/Dockerfile.template|toolforge-python37-sssd-base]]' Docker image.
* Runs the command <code>/data/project/stashbot/bin/stashbot.sh run</code> when the container starts.
* Runs the command <code>/data/project/stashbot/bin/stashbot.sh run</code> inside the container to start the bot itself.
* Mounts the <tt>/data/project/stashbot/</tt> NFS directory as <tt>/data/project/stashbot/</tt> inside the container.
* Mounts the <tt>/data/project/stashbot/</tt> NFS directory as <tt>/data/project/stashbot/</tt> inside the container.
{{Note|The ''stashbot.sh'' script assumes that a Python 3.7 virtual environment has been manually created and populated with library dependencies for the project. See [[Help:Toolforge/Web/Python#Virtual Environments and Packages]] for more information about how to create a virtual environment. Make sure you call your venv python interpreter and not /usr/bin/python.}}


===Monitoring your jobs===
===Monitoring your jobs===
Line 129: Line 133:


==Quotas and Resources==
==Quotas and Resources==
On the Kubernetes cluster, all containers run with CPU and RAM limits set, just like jobs on the Gridengine cluster. Defaults are set at ''0.5'' CPU and ''512Mi'' of memory per container. Users can adjust these up to the highest level allowed without any help from an administrator (the top limit is set at ''1'' CPU and ''4GiB'' of memory) with command line arguments to the <code>webservice</code> command (<code>--cpu</code> and <code>--mem</code>) or properly formatted Kubernetes YAML specifications for your pod's [https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources fields] for advanced users.
On the Kubernetes cluster, all containers run with CPU and RAM limits set, just like jobs on the Gridengine cluster. Defaults are set at ''0.5'' CPU and ''512Mi'' of memory per container. Users can adjust these up to the highest level allowed without any help from an administrator (the top limit is set at ''1'' CPU and ''4Gi'' of memory) with command line arguments to the <code>webservice</code> command (<code>--cpu</code> and <code>--mem</code>) or properly formatted Kubernetes YAML specifications for your pod's [https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources fields] for advanced users.


The Toolforge admin team encourages you to try running your webservice with the defaults before deciding that you need more resources. We believe that most PHP and Python3 webservices will work as expected with the lower values. Java webservices will almost certainly need higher limits due to the nature of running a JVM.
The Toolforge admin team encourages you to try running your webservice with the defaults before deciding that you need more resources. We believe that most PHP and Python3 webservices will work as expected with the lower values. Java webservices will almost certainly need higher limits due to the nature of running a JVM.
If you find that you need containers to run with '''more''' than 1 CPU and 4 GB of RAM, the [[Help:Toolforge/Kubernetes#Quota_increases|quota increase procedure]] below can request that. You can verify the per-container limits you have by running <code>kubectl describe limitranges</code>


=== Namespace-wide quotas ===
=== Namespace-wide quotas ===
Line 148: Line 154:
persistentvolumeclaims: 3
persistentvolumeclaims: 3
</syntaxhighlight>
</syntaxhighlight>
To view the live quotas that apply to your tool, run <code>kubectl describe resourcequotas</code>.


=== Quota increases ===
=== Quota increases ===
Line 161: Line 169:
* golang (go v1.11.5; ''deprecated'')
* golang (go v1.11.5; ''deprecated'')
* '''golang111''' (go v1.11.6)
* '''golang111''' (go v1.11.6)
* '''jdk11''' (openjdk 11.0.5)
* '''jdk17''' (openjdk 17)
* jdk11 (openjdk 11.0.5)
* jdk8 (openjdk 1.8.0_232; ''deprecated'')
* jdk8 (openjdk 1.8.0_232; ''deprecated'')
* '''node10''' (nodejs v10.15.2)
* node10 (nodejs v10.15.2)
* '''node12''' (nodejs v12.21.0)
* nodejs (nodejs v6.11.0; ''deprecated'')
* nodejs (nodejs v6.11.0; ''deprecated'')
* php5.6 (PHP 5.6.33; ''deprecated'')
* php5.6 (PHP 5.6.33; ''deprecated'')
* php7.2 (PHP 7.2.24; ''deprecated'')
* php7.2 (PHP 7.2.24; ''deprecated'')
* '''php7.3''' (PHP 7.3.11)
* php7.3 (PHP 7.3.11)
* '''php7.4''' (PHP 7.4.21)
* python (Python 3.4.2; ''deprecated'')
* python (Python 3.4.2; ''deprecated'')
* python2 (Python 2.7.9; ''deprecated'')
* python2 (Python 2.7.9; ''deprecated'')
* python3.5 (Python 3.5.3; ''deprecated'')
* python3.5 (Python 3.5.3; ''deprecated'')
* '''python3.7''' (Python 3.7.3)
* python3.7 (Python 3.7.3)
* '''python3.9''' (Python 3.9.2)
* ruby2 (Ruby 2.1.5p273; ''deprecated'')
* ruby2 (Ruby 2.1.5p273; ''deprecated'')
* '''ruby25''' (Ruby 2.5.5p157)
* ruby25 (Ruby 2.5.5p157)
* '''ruby27''' (Ruby 2.7)
* '''tcl''' (TCL 8.6)
* '''tcl''' (TCL 8.6)


For example to start a webservice using a php7.3 container, run:
For example to start a webservice using a php7.4 container, run:
  webservice --backend=kubernetes php7.3 start
  webservice --backend=kubernetes php7.4 start


A complete list of images is available from the [[toolforge:docker-registry|docker-registry tool]] which provides a pretty frontend for browsing the [https://docker-registry.tools.wmflabs.org/v2/_catalog Docker registry catalog].
A complete list of images is available from the [[toolforge:docker-registry|docker-registry tool]] which provides a pretty frontend for browsing the [https://docker-registry.tools.wmflabs.org/v2/_catalog Docker registry catalog].
Line 187: Line 200:
PHP uses lighttpd as a webserver, and looks for files in <code>~/public_html/</code>.
PHP uses lighttpd as a webserver, and looks for files in <code>~/public_html/</code>.
====PHP versions & packages====
====PHP versions & packages====
There are three versions of PHP available, PHP 7.3 (on Debian Buster), PHP 7.2 (on Debian Stretch), and the legacy PHP 5.6 (on Debian Jessie).
There are four versions of PHP available, PHP 7.4, PHP 7.3 (on Debian Buster), PHP 7.2 (on Debian Stretch), and the legacy PHP 5.6 (on Debian Jessie).


You can view the installed PHP extensions on the [[toolforge:phpinfo|phpinfo tool]]. This should match the PHP related packages installed on GridEngine exec nodes. Additional packages can be added on request by creating a [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Install%20%5BDESIRED%20PACKAGE%5D%20for%20Kubernetes%20%5BDESIRED%20RUNTIME%5D&description=In%20order%20to%20do%20%5BTHING%20YOU%20WANT%20TO%20DO%5D%20on%20the%20Toolforge%20Kubernetes%20cluster%2C%20the%20%5BNAME%20OF%20YOUR%20TOOL%5D%20needs%20to%20have%20%5BDESIRED%20PACKAGE%5D%20added%20to%20the%20Kubernetes%20%5BDESIRED%20RUNTIME%5D%20Docker%20image.%20%5BADDITIONAL%20DESCRIPTION%20OF%20PACKAGE%20OR%20NEED%20HELPFUL%20FOR%20STARTING%20DISCUSSION%5D&projects=toolforge-software&priority=triage Phabricator task tagged with #toolforge-software]. Software that is not packaged by Debian upstream is less likely to be added due to security and maintenance concerns.
You can view the installed PHP extensions on the [[toolforge:phpinfo|phpinfo tool]]. This should match the PHP related packages installed on GridEngine exec nodes. Additional packages can be added on request by creating a [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Install%20%5BDESIRED%20PACKAGE%5D%20for%20Kubernetes%20%5BDESIRED%20RUNTIME%5D&description=In%20order%20to%20do%20%5BTHING%20YOU%20WANT%20TO%20DO%5D%20on%20the%20Toolforge%20Kubernetes%20cluster%2C%20the%20%5BNAME%20OF%20YOUR%20TOOL%5D%20needs%20to%20have%20%5BDESIRED%20PACKAGE%5D%20added%20to%20the%20Kubernetes%20%5BDESIRED%20RUNTIME%5D%20Docker%20image.%20%5BADDITIONAL%20DESCRIPTION%20OF%20PACKAGE%20OR%20NEED%20HELPFUL%20FOR%20STARTING%20DISCUSSION%5D&projects=toolforge-software&priority=triage Phabricator task tagged with #toolforge-software]. Software that is not packaged by Debian upstream is less likely to be added due to security and maintenance concerns.


====PHP Upgrade====
====PHP Upgrade====
To upgrade from PHP 5.6 to PHP 7.3, run the following two commands:
To upgrade from PHP 5.6 to PHP 7.4, run the following two commands:
<syntaxhighlight lang="shell-session">
<syntaxhighlight lang="shell-session">
$ webservice stop
$ webservice stop
$ webservice --backend=kubernetes php7.3 start
$ webservice --backend=kubernetes php7.4 start
</syntaxhighlight>
</syntaxhighlight>



Revision as of 21:01, 22 September 2021

Overview

Kubernetes (often abbreviated k8s) is a platform for running containers. It is used in Toolforge to isolate Tools from each other and allow distributing Tools across a pool of servers.

Kubernetes webservices

The Toolforge webservice command has a --backend=kubernetes mode that will start, stop, and restart containers designed to run web services for various languages. See our Webservice help for more details.

Kubernetes backend has the following options:

  -m MEMORY, --mem MEMORY
                        Set higher Kubernetes memory limit
  -c CPU, --cpu CPU     Set a higher Kubernetes cpu limit
  -r REPLICAS, --replicas REPLICAS
                        Set the number of pod replicas to use

Kubernetes cronjobs

It is possible to run cron jobs on Kubernetes (see upstream documentation for a full description).

Example cronjob.yaml

Wikiloveslove is a Python 3.7 bot that runs in a Kubernetes deployment. The cronjobs.yaml file that it uses to tell Kubernetes how to start and schedule the bot is reproduced below.

Create the CronJob object in your tool's Kubernetes namespace using kubectl:


$ kubectl apply --validate=true -f $HOME/cronjobs.yaml
cronjob.batch/CRONJOB-NAME configured

After creating the cronjob you can create a test job with kubectl create job --from=cronjob/CRONJOB-NAME test to immediately trigger the cronjob and then access the logs as usual with kubectl logs job/test -f to debug.

If that doesn't give you any useful output, try kubectl describe job/test to see what's going on: it might be a misconfigured limit, for instance.

Kubernetes continuous jobs

The basic unit of managing execution on a Kubernetes cluster is called a "deployment". Each deployment is described with a YAML configuration file which describes the container images to be started ("pods" in the Kubernetes terminology) and commands to be run inside them after the container is initialized. A deployment also specifies where the pods run and what external resources are connected to them. The upstream documentation is comprehensive.

Example deployment.yaml

Stashbot is a Python 3.7 irc bot that runs in a Kubernetes deployment. The deployment.yaml file that it uses to tell Kubernetes how to start the bot is reproduced below. This deployment is launched using a stashbot.sh wrapper script which runs kubectl create --validate=true -f /data/project/stashbot/etc/deployment.yaml.

This deployment:

  • Uses the 'tool-stashbot' namespace that the tool is authorized to control
  • Creates a container using the 'latest' version of the 'docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base' Docker image.
  • Runs the command /data/project/stashbot/bin/stashbot.sh run inside the container to start the bot itself.
  • Mounts the /data/project/stashbot/ NFS directory as /data/project/stashbot/ inside the container.

Monitoring your jobs

You can see which jobs you have running with kubectl get pods. Using the name of the pod, you can see the logs with kubectl logs <pod-name>.

To restart a failing pod, use kubectl delete <pod-name>. If you need to kill it entirely, find the deployment name with kubectl get deployment, and delete it with kubectl delete deployment <deployment-name>.

Namespaces

Each tool has been granted control of a Kubernetes "namespace". Your tool can only create and control objects in its namespace. A tool's namespace is the same as the tool's name with "tool-" appended to the beginning (e.g. tool-admin, tool-stashbot, tool-hay, etc).

Quotas and Resources

On the Kubernetes cluster, all containers run with CPU and RAM limits set, just like jobs on the Gridengine cluster. Defaults are set at 0.5 CPU and 512Mi of memory per container. Users can adjust these up to the highest level allowed without any help from an administrator (the top limit is set at 1 CPU and 4Gi of memory) with command line arguments to the webservice command (--cpu and --mem) or properly formatted Kubernetes YAML specifications for your pod's resources fields for advanced users.

The Toolforge admin team encourages you to try running your webservice with the defaults before deciding that you need more resources. We believe that most PHP and Python3 webservices will work as expected with the lower values. Java webservices will almost certainly need higher limits due to the nature of running a JVM.

If you find that you need containers to run with more than 1 CPU and 4 GB of RAM, the quota increase procedure below can request that. You can verify the per-container limits you have by running kubectl describe limitranges

Namespace-wide quotas

Your entire tool account can only consume so many cluster resources. The cluster places quota limits on an entire namespace which determine how many pods can be used, how many service ports can be exposed, total memory, total CPU, and others. The default limits for a tool's entire namespace are:

requests.cpu: 2           # Soft limit on CPU usage
requests.memory: "6Gi"    # Soft limit on memory usage
limits.cpu: 2             # Hard limit on CPU usage
limits.memory: "8Gi"      # Hard limit on memory usage
pods: 4
services: 1
services.nodeport: 0      # Nodeport services are not allowed
replicationcontrollers: 1
secrets: 10
configmaps: 10
persistentvolumeclaims: 3

To view the live quotas that apply to your tool, run kubectl describe resourcequotas.

Quota increases

It is possible to request a quota increase if you can demonstrate your tool's need for more resources than the default namespace quota allows. Instructions and a template link for creating a quota request can be found at Toolforge (Quota requests) in Phabricator. Please read all the instructions there before submitting your request.

Container images

The Toolforge Kubernetes cluster is restricted to loading Docker images published at docker-registry.tools.wmflabs.org (see Portal:Toolforge/Admin/Kubernetes#Docker Images for more information). These images are built using the Dockerfiles in the operations/docker-images/toollabs-images git repository.

Available container types

The webservice command has an optional type argument that allows you to choose which Docker container to run your Tool in.

Currently provided types:

  • golang (go v1.11.5; deprecated)
  • golang111 (go v1.11.6)
  • jdk17 (openjdk 17)
  • jdk11 (openjdk 11.0.5)
  • jdk8 (openjdk 1.8.0_232; deprecated)
  • node10 (nodejs v10.15.2)
  • node12 (nodejs v12.21.0)
  • nodejs (nodejs v6.11.0; deprecated)
  • php5.6 (PHP 5.6.33; deprecated)
  • php7.2 (PHP 7.2.24; deprecated)
  • php7.3 (PHP 7.3.11)
  • php7.4 (PHP 7.4.21)
  • python (Python 3.4.2; deprecated)
  • python2 (Python 2.7.9; deprecated)
  • python3.5 (Python 3.5.3; deprecated)
  • python3.7 (Python 3.7.3)
  • python3.9 (Python 3.9.2)
  • ruby2 (Ruby 2.1.5p273; deprecated)
  • ruby25 (Ruby 2.5.5p157)
  • ruby27 (Ruby 2.7)
  • tcl (TCL 8.6)

For example to start a webservice using a php7.4 container, run:

webservice --backend=kubernetes php7.4 start

A complete list of images is available from the docker-registry tool which provides a pretty frontend for browsing the Docker registry catalog.

As of Feb 2018, we don't support mixed runtime containers. This may change in the future. Also, we don't support "bring your own container" on our kubernetes (yet!). And there is no mechanism for a user to install system packages inside of a container.

PHP

PHP uses lighttpd as a webserver, and looks for files in ~/public_html/.

PHP versions & packages

There are four versions of PHP available, PHP 7.4, PHP 7.3 (on Debian Buster), PHP 7.2 (on Debian Stretch), and the legacy PHP 5.6 (on Debian Jessie).

You can view the installed PHP extensions on the phpinfo tool. This should match the PHP related packages installed on GridEngine exec nodes. Additional packages can be added on request by creating a Phabricator task tagged with #toolforge-software. Software that is not packaged by Debian upstream is less likely to be added due to security and maintenance concerns.

PHP Upgrade

To upgrade from PHP 5.6 to PHP 7.4, run the following two commands:

$ webservice stop
$ webservice --backend=kubernetes php7.4 start

To switch back:

$ webservice stop
$ webservice --backend=kubernetes php5.6 start

Running Locally

You may run the container on your local computer (not on Toolforge servers) by executing a command like this:

$ docker run --name toolforge -p 8888:80 -v "${PWD}:/var/www/html:cached" -d docker-registry.tools.wmflabs.org/toolforge-php73-sssd-web sh -c "lighty-enable-mod fastcgi-php && lighttpd -D -f /etc/lighttpd/lighttpd.conf"

Then the tool will be available at http://localhost:8888

Node.js

The container images for Node.js, such as docker-registry.tools.wmflabs.org/toollabs-nodejs-base:latest currently come with a current version of Node.js LTS from Wikimedia APT (as of September 2018, this is Node.js 6). This is the same version used by Wikimedia Foundation in production and for continuous integration.

Broken npm

Given npm is not suitable for use in Wikimedia production, the version of Node.js provided by Wikimedia APT is compiled without npm. (Unlike the official Node.js distribution.) And because there is no use for npm in Wikimedia production, there is no "npm" Debian package maintained in Wikimedia APT. The result is that the only "npm" Debian package available is the one from upstream Debian, which is npm 1.4 which was originally bundled in 2014 with Node 0.10 (debian/npm, debian/nodejs). This version is EOL and is incompatible with most packages on the npmjs.org registry. To update it within your container, follow these steps

# Step 1: Start a shell in your Node.js pod (see "Shell" section below)
tool@tools-login$ kubectl exec -it podname-123-aaa -- /bin/bash

# Step 2: Create $HOME/bin and ensure it is in your PATH
podname:/data/project/tool$ mkdir bin/
podname:/data/project/tool$ export PATH="${HOME}/bin:${PATH}"
# To avoid having to re-export PATH every time you use your tool, add the export command to your .bashrc file!

# Step 3: Use npm to install 'npm'
podname:/data/project/tool$ npm install npm
....
# This installs the current version of npm at node_modules/.bin/npm

# Step 4: Create a symlink in $HOME/bin
podname:/data/project/tool$ ln -s $HOME/node_modules/.bin/npm $HOME/bin/npm
# Close the shell and create a new shell (to initialise PATH)
podname:/data/project/tool$ exit
tool@tools-login$ kubectl exec -it podname-123-aaa -- /bin/bash
podname:/data/project/tool$

# Step 5: Verify that you now use a current npm instead of npm 1.4
podname:/data/project/tool$ npm --version
6.4.1 

Troubleshooting

"failed to create new OS thread" from kubectl

If kubectl get pods or a similar command fails with the error message "runtime: failed to create new OS thread (have 12 already; errno=11)", use GOMAXPROCS=1 kubectl ... to reduce the number of resources that kubectl requests from the operating system.

Get a shell inside a running Pod

Kubectl can be used to open a shell inside a running Pod: $ kubectl exec -it $NAME_OF_POD -- /bin/bash

See Get a Shell to a Running Container at kubernetes.io/docs for more information.

Communication and support

We communicate and provide support through several primary channels. Please reach out with questions and to join the conversation.

Communicate with us
Connect Best for
Phabricator Workboard #Cloud-Services Task tracking and bug reporting
IRC Channel #wikimedia-cloud connect
Telegram bridge
mattermost bridge
General discussion and support
Mailing List cloud@ Information about ongoing initiatives, general discussion and support
Announcement emails cloud-announce@ Information about critical changes (all messages mirrored to cloud@)
News wiki page News Information about major near-term plans
Cloud Services Blog Clouds & Unicorns Learning more details about some of our work
Wikimedia Technical Blog techblog.wikimedia.org News and stories from the Wikimedia technical movement

See also