You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Help:Toolforge/Kubernetes: Difference between revisions
imported>Jprorama (→Available container types: Add link to diffusion rODIT repo for access to current list of containers) |
imported>Majavah (→Broken npm: there is a modern npm these days) |
||
(50 intermediate revisions by 22 users not shown) | |||
Line 1: | Line 1: | ||
{{Template:Toolforge nav}} | {{Template:Toolforge nav}} | ||
Kubernetes (often abbreviated k8s) is a platform for running containers. It is used in Toolforge to isolate Tools from each other and allow distributing Tools across a pool of servers. | == Overview == | ||
'''[[w:Kubernetes|Kubernetes]]''' (often abbreviated '''k8s''') is a platform for running [[w:Operating-system-level virtualization|containers]]. It is used in Toolforge to isolate Tools from each other and allow distributing Tools across a pool of servers. | |||
== Kubernetes webservices == | You can think about container like a "micro virtual machine" with only task to execute a single application, it has its own (minimal) file system and limited CPU and memory resources. In Kubernetes each container is inside a pod, that is what connect the container with the tools directories, the db replicas, the internet and with other pods. | ||
One characteristic of containers is that, due to the small size, it can not have all packages that you can often find in other Toolforge virtual machines like the tools-login and grid engine nodes, so you need to select one container image that has the packages you need, you can see the images available in the section [[Help:Toolforge/Kubernetes#Container images|container images]] below. | |||
==Kubernetes webservices== | |||
The Toolforge <code>webservice</code> command has a <code>--backend=kubernetes</code> mode that will start, stop, and restart containers designed to run web services for various languages. See our [[Help:Toolforge/Web|Webservice help]] for more details. | The Toolforge <code>webservice</code> command has a <code>--backend=kubernetes</code> mode that will start, stop, and restart containers designed to run web services for various languages. See our [[Help:Toolforge/Web|Webservice help]] for more details. | ||
== Kubernetes | Kubernetes backend has the following options: | ||
<pre> | |||
-m MEMORY, --mem MEMORY | |||
Set higher Kubernetes memory limit | |||
-c CPU, --cpu CPU Set a higher Kubernetes cpu limit | |||
-r REPLICAS, --replicas REPLICAS | |||
Set the number of pod replicas to use | |||
</pre> | |||
== Kubernetes jobs == | |||
Every non-trivial task performed in Toolforge (like executing a script or running a bot) should be dispatched to a job scheduling backend (in this case, Kubernetes), which ensures that the job is run in a suitable place with sufficient resources. | |||
The basic principle of running jobs is fairly straightforward: | |||
* You create a job from a submission server (usually <code>login.toolforge.org</code>) | |||
* Kubernetes finds a suitable execution node to run the job on, and starts it there once resources are available | |||
* As it runs, your job will send output and errors to files until the job completes or is aborted. | |||
Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once. | |||
There are two ways of running jobs on Kubernetes. | |||
* by using the [[Help:Toolforge/Jobs framework | Toolforge jobs framework]] (recommended). | |||
* by directly using the [[Help:Toolforge/Raw kubernetes jobs | raw Kubernetes API]]. | |||
Previous to allowing jobs on Kubernetes, Toolforge offered [[Help:Toolforge/Grid | Grid Engine]] as job scheduling backend. | |||
==Namespaces== | |||
Each tool has been granted control of a Kubernetes [https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ "namespace"]. Your tool can only create and control objects in its namespace. A tool's namespace is the same as the tool's name with "tool-" appended to the beginning (e.g. <code>tool-admin</code>, <code>tool-stashbot</code>, <code>tool-hay</code>, etc). | |||
You can see monitoring data of your namespace in Grafana, enter in [https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources this page] and select your namespace in the select box at the top of the page. | |||
== | ==Quotas and Resources== | ||
On the Kubernetes cluster, all containers run with CPU and RAM limits set, just like jobs on the Gridengine cluster. Defaults are set at ''0.5'' CPU and ''512Mi'' of memory per container. Users can adjust these up to the highest level allowed without any help from an administrator (the top limit is set at ''1'' CPU and ''4Gi'' of memory) with command line arguments to the <code>webservice</code> command (<code>--cpu</code> and <code>--mem</code>) or properly formatted Kubernetes YAML specifications for your pod's [https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources fields] for advanced users. | |||
The Toolforge admin team encourages you to try running your webservice with the defaults before deciding that you need more resources. We believe that most PHP and Python3 webservices will work as expected with the lower values. Java webservices will almost certainly need higher limits due to the nature of running a JVM. | |||
If you find that you need containers to run with '''more''' than 1 CPU and 4 GB of RAM, the [[Help:Toolforge/Kubernetes#Quota_increases|quota increase procedure]] below can request that. You can verify the per-container limits you have by running <code>kubectl describe limitranges</code> | |||
The default storage size limit of a container, including the image size, is 10GB. You can store temporary data in the container directory /tmp, when the container ends all data is lost. For persistent storage use your tool directory, tools directories are in a NFS, mounted in /data/project, they are not inside the container. | |||
=== Namespace-wide quotas === | |||
Your entire tool account can only consume so many cluster resources. The cluster places quota limits on an entire namespace which determine how many pods can be used, how many service ports can be exposed, total memory, total CPU, and others. The default limits for a tool's entire namespace are: | |||
<syntaxhighlight lang=yaml> | |||
requests.cpu: 2 # Soft limit on CPU usage | |||
requests.memory: "6Gi" # Soft limit on memory usage | |||
limits.cpu: 2 # Hard limit on CPU usage | |||
limits.memory: "8Gi" # Hard limit on memory usage | |||
pods: 4 | |||
services: 1 | |||
services.nodeport: 0 # Nodeport services are not allowed | |||
replicationcontrollers: 1 | |||
secrets: 10 | |||
configmaps: 10 | |||
persistentvolumeclaims: 3 | |||
</syntaxhighlight> | |||
To view the live quotas that apply to your tool, run <code>kubectl describe resourcequotas</code>. | |||
== | === Quota increases === | ||
It is possible to request a quota increase if you can demonstrate your tool's need for more resources than the default namespace quota allows. Instructions and a template link for creating a quota request can be found at [[phab:project/manage/4834/|Toolforge (Quota requests)]] in Phabricator. Please read all the instructions there before submitting your request. | |||
=== Available container types === | ==Container images== | ||
The Toolforge Kubernetes cluster is restricted to loading Docker images published at <code>docker-registry.tools.wmflabs.org</code> (see [[Portal:Toolforge/Admin/Kubernetes#Docker Images]] for more information). These images are built using the Dockerfiles in the [[phab:diffusion/ODIT/repository/master/|operations/docker-images/toollabs-images]] git repository. | |||
===Available container types=== | |||
The <code>webservice</code> command has an optional ''type'' argument that allows you to choose which Docker container to run your Tool in. | The <code>webservice</code> command has an optional ''type'' argument that allows you to choose which Docker container to run your Tool in. | ||
Currently provided types: | Currently provided types: | ||
* golang | * golang (go v1.11.5; ''deprecated'') | ||
* jdk8 | * '''golang111''' (go v1.11.6) | ||
* nodejs | * '''jdk17''' (openjdk 17) | ||
* php5.6 | * jdk11 (openjdk 11.0.5) | ||
* python | * jdk8 (openjdk 1.8.0_232; ''deprecated'') | ||
* python2 | * node10 (nodejs v10.15.2) | ||
* ruby2 | * node12 (nodejs v12.21.0) | ||
* tcl | * '''node16''' (nodejs v16.16.0) | ||
* nodejs (nodejs v6.11.0; ''deprecated'') | |||
* '''perl5.32''' (perl v5.32.1) | |||
* php5.6 (PHP 5.6.33; ''deprecated'') | |||
* php7.2 (PHP 7.2.24; ''deprecated'') | |||
* php7.3 (PHP 7.3.11) | |||
* '''php7.4''' (PHP 7.4.21) | |||
* python (Python 3.4.2; ''deprecated'') | |||
* python2 (Python 2.7.9; ''deprecated'') | |||
* python3.5 (Python 3.5.3; ''deprecated'') | |||
* python3.7 (Python 3.7.3) | |||
* '''python3.9''' (Python 3.9.2) | |||
* ruby2 (Ruby 2.1.5p273; ''deprecated'') | |||
* ruby25 (Ruby 2.5.5p157) | |||
* '''ruby27''' (Ruby 2.7) | |||
* '''tcl''' (TCL 8.6) | |||
For example to start a webservice using a php7.4 container, run: | |||
webservice --backend=kubernetes php7.4 start | |||
A complete list of images is available from the [[toolforge:docker-registry|docker-registry tool]] which provides a pretty frontend for browsing the [https://docker-registry.tools.wmflabs.org/v2/_catalog Docker registry catalog]. | |||
As of Feb 2018, we don't support mixed runtime containers. This may change in the future. Also, we don't support "bring your own container" on our kubernetes (yet!). And there is no mechanism for a user to install system packages inside of a container. | |||
===PHP=== | |||
{{anchor|php5.6 (Lighttpd + PHP)}} | |||
PHP uses lighttpd as a webserver, and looks for files in <code>~/public_html/</code>. | |||
====PHP versions & packages==== | |||
There are four versions of PHP available, PHP 7.4, PHP 7.3 (on Debian Buster), PHP 7.2 (on Debian Stretch), and the legacy PHP 5.6 (on Debian Jessie). | |||
You can view the installed PHP extensions on the [[toolforge:phpinfo|phpinfo tool]]. This should match the PHP related packages installed on GridEngine exec nodes. Additional packages can be added on request by creating a [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=Install%20%5BDESIRED%20PACKAGE%5D%20for%20Kubernetes%20%5BDESIRED%20RUNTIME%5D&description=In%20order%20to%20do%20%5BTHING%20YOU%20WANT%20TO%20DO%5D%20on%20the%20Toolforge%20Kubernetes%20cluster%2C%20the%20%5BNAME%20OF%20YOUR%20TOOL%5D%20needs%20to%20have%20%5BDESIRED%20PACKAGE%5D%20added%20to%20the%20Kubernetes%20%5BDESIRED%20RUNTIME%5D%20Docker%20image.%20%5BADDITIONAL%20DESCRIPTION%20OF%20PACKAGE%20OR%20NEED%20HELPFUL%20FOR%20STARTING%20DISCUSSION%5D&projects=toolforge-software&priority=triage Phabricator task tagged with #toolforge-software]. Software that is not packaged by Debian upstream is less likely to be added due to security and maintenance concerns. | |||
====PHP Upgrade==== | |||
To upgrade from PHP 5.6 to PHP 7.4, run the following two commands: | |||
<syntaxhighlight lang="shell-session"> | |||
$ webservice stop | |||
$ webservice --backend=kubernetes php7.4 start | |||
</syntaxhighlight> | |||
To switch back: | |||
<syntaxhighlight lang="shell-session"> | |||
$ webservice stop | |||
$ webservice --backend=kubernetes php5.6 start | |||
</syntaxhighlight> | |||
====Running Locally==== | |||
You may run the container on your ''local'' computer (not on Toolforge servers) by executing a command like this: <syntaxhighlight lang="shell-session"> | |||
$ docker run --name toolforge -p 8888:80 -v "${PWD}:/var/www/html:cached" -d docker-registry.tools.wmflabs.org/toolforge-php73-sssd-web sh -c "lighty-enable-mod fastcgi-php && lighttpd -D -f /etc/lighttpd/lighttpd.conf" | |||
</syntaxhighlight> Then the tool will be available at http://localhost:8888 | |||
===Node.js=== | |||
The Node.js container images contain a version Node.js LTS, NPM and Yarn either packaged by Debian or by [https://github.com/nodesource/distributions Nodesource]. | |||
==Troubleshooting== | |||
=== "failed to create new OS thread" from kubectl === | |||
If <code>kubectl get pods</code> or a similar command fails with the error message "''runtime: failed to create new OS thread (have 12 already; errno=11)''", use <code>GOMAXPROCS=1 kubectl ...</code> to reduce the number of resources that ''kubectl'' requests from the operating system. | |||
The active thread quota is per-user, not per-session or per-tool, so if you have multiple shell sessions open to the same bastion server this will effect the available quota for each of your shells. | |||
=== Get a shell inside a running Pod === | |||
Kubectl can be used to open a shell inside a running Pod: <syntaxhighlight lang="shell-session" inline>$ kubectl exec -it $NAME_OF_POD -- /bin/bash</syntaxhighlight> | |||
See [https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/ Get a Shell to a Running Container] at kubernetes.io/docs for more information. | |||
{{:Help:Cloud Services communication}} | |||
==See also== | |||
*[[Portal:Toolforge/Admin/Kubernetes|Kubernetes administration]] | |||
*[https://kubernetes.io/docs/user-guide/kubectl-cheatsheet/ kubectl (Kubernetes command line tool) cheatsheet] | |||
[[Category:Toolforge|Kubernetes]] | |||
[[Category:Documentation]] | |||
[[Category:Cloud Services]] |
Revision as of 17:20, 3 August 2022
Overview
Kubernetes (often abbreviated k8s) is a platform for running containers. It is used in Toolforge to isolate Tools from each other and allow distributing Tools across a pool of servers.
You can think about container like a "micro virtual machine" with only task to execute a single application, it has its own (minimal) file system and limited CPU and memory resources. In Kubernetes each container is inside a pod, that is what connect the container with the tools directories, the db replicas, the internet and with other pods.
One characteristic of containers is that, due to the small size, it can not have all packages that you can often find in other Toolforge virtual machines like the tools-login and grid engine nodes, so you need to select one container image that has the packages you need, you can see the images available in the section container images below.
Kubernetes webservices
The Toolforge webservice
command has a --backend=kubernetes
mode that will start, stop, and restart containers designed to run web services for various languages. See our Webservice help for more details.
Kubernetes backend has the following options:
-m MEMORY, --mem MEMORY Set higher Kubernetes memory limit -c CPU, --cpu CPU Set a higher Kubernetes cpu limit -r REPLICAS, --replicas REPLICAS Set the number of pod replicas to use
Kubernetes jobs
Every non-trivial task performed in Toolforge (like executing a script or running a bot) should be dispatched to a job scheduling backend (in this case, Kubernetes), which ensures that the job is run in a suitable place with sufficient resources.
The basic principle of running jobs is fairly straightforward:
- You create a job from a submission server (usually
login.toolforge.org
) - Kubernetes finds a suitable execution node to run the job on, and starts it there once resources are available
- As it runs, your job will send output and errors to files until the job completes or is aborted.
Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once.
There are two ways of running jobs on Kubernetes.
- by using the Toolforge jobs framework (recommended).
- by directly using the raw Kubernetes API.
Previous to allowing jobs on Kubernetes, Toolforge offered Grid Engine as job scheduling backend.
Namespaces
Each tool has been granted control of a Kubernetes "namespace". Your tool can only create and control objects in its namespace. A tool's namespace is the same as the tool's name with "tool-" appended to the beginning (e.g. tool-admin
, tool-stashbot
, tool-hay
, etc).
You can see monitoring data of your namespace in Grafana, enter in this page and select your namespace in the select box at the top of the page.
Quotas and Resources
On the Kubernetes cluster, all containers run with CPU and RAM limits set, just like jobs on the Gridengine cluster. Defaults are set at 0.5 CPU and 512Mi of memory per container. Users can adjust these up to the highest level allowed without any help from an administrator (the top limit is set at 1 CPU and 4Gi of memory) with command line arguments to the webservice
command (--cpu
and --mem
) or properly formatted Kubernetes YAML specifications for your pod's resources fields for advanced users.
The Toolforge admin team encourages you to try running your webservice with the defaults before deciding that you need more resources. We believe that most PHP and Python3 webservices will work as expected with the lower values. Java webservices will almost certainly need higher limits due to the nature of running a JVM.
If you find that you need containers to run with more than 1 CPU and 4 GB of RAM, the quota increase procedure below can request that. You can verify the per-container limits you have by running kubectl describe limitranges
The default storage size limit of a container, including the image size, is 10GB. You can store temporary data in the container directory /tmp, when the container ends all data is lost. For persistent storage use your tool directory, tools directories are in a NFS, mounted in /data/project, they are not inside the container.
Namespace-wide quotas
Your entire tool account can only consume so many cluster resources. The cluster places quota limits on an entire namespace which determine how many pods can be used, how many service ports can be exposed, total memory, total CPU, and others. The default limits for a tool's entire namespace are:
requests.cpu: 2 # Soft limit on CPU usage
requests.memory: "6Gi" # Soft limit on memory usage
limits.cpu: 2 # Hard limit on CPU usage
limits.memory: "8Gi" # Hard limit on memory usage
pods: 4
services: 1
services.nodeport: 0 # Nodeport services are not allowed
replicationcontrollers: 1
secrets: 10
configmaps: 10
persistentvolumeclaims: 3
To view the live quotas that apply to your tool, run kubectl describe resourcequotas
.
Quota increases
It is possible to request a quota increase if you can demonstrate your tool's need for more resources than the default namespace quota allows. Instructions and a template link for creating a quota request can be found at Toolforge (Quota requests) in Phabricator. Please read all the instructions there before submitting your request.
Container images
The Toolforge Kubernetes cluster is restricted to loading Docker images published at docker-registry.tools.wmflabs.org
(see Portal:Toolforge/Admin/Kubernetes#Docker Images for more information). These images are built using the Dockerfiles in the operations/docker-images/toollabs-images git repository.
Available container types
The webservice
command has an optional type argument that allows you to choose which Docker container to run your Tool in.
Currently provided types:
- golang (go v1.11.5; deprecated)
- golang111 (go v1.11.6)
- jdk17 (openjdk 17)
- jdk11 (openjdk 11.0.5)
- jdk8 (openjdk 1.8.0_232; deprecated)
- node10 (nodejs v10.15.2)
- node12 (nodejs v12.21.0)
- node16 (nodejs v16.16.0)
- nodejs (nodejs v6.11.0; deprecated)
- perl5.32 (perl v5.32.1)
- php5.6 (PHP 5.6.33; deprecated)
- php7.2 (PHP 7.2.24; deprecated)
- php7.3 (PHP 7.3.11)
- php7.4 (PHP 7.4.21)
- python (Python 3.4.2; deprecated)
- python2 (Python 2.7.9; deprecated)
- python3.5 (Python 3.5.3; deprecated)
- python3.7 (Python 3.7.3)
- python3.9 (Python 3.9.2)
- ruby2 (Ruby 2.1.5p273; deprecated)
- ruby25 (Ruby 2.5.5p157)
- ruby27 (Ruby 2.7)
- tcl (TCL 8.6)
For example to start a webservice using a php7.4 container, run:
webservice --backend=kubernetes php7.4 start
A complete list of images is available from the docker-registry tool which provides a pretty frontend for browsing the Docker registry catalog.
As of Feb 2018, we don't support mixed runtime containers. This may change in the future. Also, we don't support "bring your own container" on our kubernetes (yet!). And there is no mechanism for a user to install system packages inside of a container.
PHP
PHP uses lighttpd as a webserver, and looks for files in ~/public_html/
.
PHP versions & packages
There are four versions of PHP available, PHP 7.4, PHP 7.3 (on Debian Buster), PHP 7.2 (on Debian Stretch), and the legacy PHP 5.6 (on Debian Jessie).
You can view the installed PHP extensions on the phpinfo tool. This should match the PHP related packages installed on GridEngine exec nodes. Additional packages can be added on request by creating a Phabricator task tagged with #toolforge-software. Software that is not packaged by Debian upstream is less likely to be added due to security and maintenance concerns.
PHP Upgrade
To upgrade from PHP 5.6 to PHP 7.4, run the following two commands:
$ webservice stop
$ webservice --backend=kubernetes php7.4 start
To switch back:
$ webservice stop
$ webservice --backend=kubernetes php5.6 start
Running Locally
You may run the container on your local computer (not on Toolforge servers) by executing a command like this:
$ docker run --name toolforge -p 8888:80 -v "${PWD}:/var/www/html:cached" -d docker-registry.tools.wmflabs.org/toolforge-php73-sssd-web sh -c "lighty-enable-mod fastcgi-php && lighttpd -D -f /etc/lighttpd/lighttpd.conf"
Then the tool will be available at http://localhost:8888
Node.js
The Node.js container images contain a version Node.js LTS, NPM and Yarn either packaged by Debian or by Nodesource.
Troubleshooting
"failed to create new OS thread" from kubectl
If kubectl get pods
or a similar command fails with the error message "runtime: failed to create new OS thread (have 12 already; errno=11)", use GOMAXPROCS=1 kubectl ...
to reduce the number of resources that kubectl requests from the operating system.
The active thread quota is per-user, not per-session or per-tool, so if you have multiple shell sessions open to the same bastion server this will effect the available quota for each of your shells.
Get a shell inside a running Pod
Kubectl can be used to open a shell inside a running Pod: $ kubectl exec -it $NAME_OF_POD -- /bin/bash
See Get a Shell to a Running Container at kubernetes.io/docs for more information.
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia Movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect, the bridged Telegram group, or the bridged Mattermost channel
- Discuss via email after you subscribed to the cloud@ mailing list