You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
PAWS/Admin: Difference between revisions
imported>Bstorm |
imported>Vivian Rook |
||
(15 intermediate revisions by 5 users not shown) | |||
Line 2: | Line 2: | ||
== Introduction == | == Introduction == | ||
[[PAWS]] is a [https://jupyterhub.readthedocs.io/en/latest/ Jupyterhub] deployment that runs in the [https://openstack-browser.toolforge.org/project/paws PAWS Coud VPS project]. The main Jupyterhub login is accessible at https://hub | [[PAWS]] is a [https://jupyterhub.readthedocs.io/en/latest/ Jupyterhub] deployment that runs in the [https://openstack-browser.toolforge.org/project/paws PAWS Coud VPS project]. The main Jupyterhub login is accessible at https://hub-paws.wmcloud.org/hub/login, and is a public service that can authenticated to via Wikimedia OAuth. More end-user info is at [[PAWS]]. Besides a simple Jupyterhub deployment, PAWS also contains easy access methods for the [[Wiki_replicas|wiki replicas]], the wikis themselves via the OAuth grant and [https://www.mediawiki.org/wiki/Manual:Pywikibot pywikibot]. | ||
== Kubernetes cluster == | == Kubernetes cluster == | ||
=== Deployment === | === Deployment === | ||
from openstack controller (to make a cluster called 'paws' using the 'paws-k8s21' template): | |||
openstack coe cluster create paws --cluster-template paws-k8s21 --master-count 1 --node-count 3 | |||
# get kube config | |||
openstack coe cluster config paws --dir /tmp/ | |||
cat /tmp/config | |||
from tools bastion (tools-sgebastion-10.tools.eqiad1.wikimedia.cloud): | |||
=== | Put the output into .kube/config, or somewhere else and <code>export KUBECONFIG=<location></code> | ||
Deploy a new trove db, and add a "paws" database to it. Update the secrets file to have the db login information for this new db. (If the following is failing to deploy, check troubleshooting) | |||
helm upgrade --install ingress-nginx ingress-nginx \ | |||
--version v4.4.0 \ | |||
--repo <nowiki>https://kubernetes.github.io/ingress-nginx</nowiki> \ | |||
--namespace ingress-nginx --create-namespace \ | |||
--set controller.service.type=NodePort \ | |||
--set controller.service.enableHttps=false \ | |||
--set controller.service.nodePorts.http=30001 \ | |||
--set-string controller.config.proxy-body-size="4m" # T328168 | |||
git clone <nowiki>https://github.com/toolforge/paws.git</nowiki> # decrypt if necessary | |||
git checkout <checkout the updated secrets file for the new db> | |||
kubectl config set-context --current --namespace=prod | |||
helm repo add jupyterhub <nowiki>https://jupyterhub.github.io/helm-chart/</nowiki> | |||
helm dep up paws/ | |||
kubectl create namespace prod | |||
helm install paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m | |||
kubectl apply -f manifests/psp.yaml | |||
Update Web Proxies in Horizon. DNS > Web Proxies | |||
Point hub-paws and public-paws to the first node of the new cluster | |||
==== Troubleshooting ==== | |||
Magnum relies on some containers from dockerhub. Dockerhub will limit after 100 anonymous pulls in a six hour window. If your containers are not deploying complaining of taint problems, check kube-system containers:<syntaxhighlight lang="bash"> | |||
kubectl get all -n kube-system | |||
</syntaxhighlight> | |||
If containers are crash looping, you will likely have to add a docker credential to them (You can check using kubectl describe <pod name>) | |||
<syntaxhighlight lang="bash"> | |||
docker login | |||
create secret generic regcred --from-file=.dockerconfigjson=<path to your docker/config.json> --type=kubernetes.io/dockerconfigjson -n kube-system | |||
Using the following edit commands add: | |||
imagePullSecrets: | |||
- name: regcred | |||
under spec.template.spec | |||
kubectl edit -n kube-system daemonset.apps/openstack-cloud-controller-manager | |||
kubectl edit -n kube-system deployment.apps/kubernetes-dashboard | |||
kubectl edit -n kube-system deployment.apps/dashboard-metrics-scraper | |||
kubectl edit -n kube-system daemonset.apps/k8s-keystone-auth | |||
</syntaxhighlight> | |||
=== Upgrading === | |||
Upgrading of the cluster should be preformed the same as the deployment of the cluster. Just using a new magnum template, that defines the newer k8s version. Thus upgrading is the same as disaster recovery. | |||
=== | === Architecture === | ||
The core of paws is run on openstack magnum. Thus k8saas. In concept it should be able to be runable on any k8s, so long as it has access to a db and nfs. | |||
==== Floating IP ==== | ==== Floating IP ==== | ||
Line 29: | Line 75: | ||
==== Ports ==== | ==== Ports ==== | ||
At the load balancer layer (haproxy), routing is done by port back to the Kubernetes | At the load balancer layer (haproxy), routing is done by port back to the Kubernetes worker nodes. The ingress layer is served at the well-known web ports (TCP 80 and 443), which hits the worker nodes on a Nodeport service at port 30001. The neutron security group '''paws-loadbalancer''' prevents internet clients from contacting the k8s API at this time. | ||
==== TLS ==== | ==== TLS ==== | ||
TLS certs are done via [[Acme-chief/Cloud_VPS_setup|acme-chief]] and distributed to the haproxy load balancer layer. Therefore inside the cluster, Kubernetes basically has the TLS ingress bits in helm turned off. | TLS certs are done via [[Acme-chief/Cloud_VPS_setup|acme-chief]] and distributed to the haproxy load balancer layer. Therefore inside the cluster, Kubernetes basically has the TLS ingress bits in helm turned off. | ||
==== | ==== Helm ==== | ||
[https://helm.sh Helm] 3 is used to deploy kubernetes applications on the cluster. It is installed by puppet via Debian package. The community supported ingress-nginx controller is deployed from its own helm chart, but the ingress objects are all managed in the PAWS helm chart. As this is helm 3, there is no tiller and RBAC affects what you can do. | |||
==== Add a worker ==== | |||
from openstack controller: | |||
<code>openstack coe cluster resize <cluster name> <size you want it to be></code> | |||
ex: | |||
<code>openstack coe cluster resize paws-dev 4</code> | |||
=== General notes === | === General notes === | ||
* The | * The haproxy nodes are part of separate anti-affinity server groups so that Openstack will not schedule them on the same hypervisor. | ||
* To see status of k8s control plane pods (running coredns, kube-proxy, calico, etcd, kube-apiserver, kube-controller-manager), see <code>kubectl --namespace=kube-system get pod -o wide</code>. | * To see status of k8s control plane pods (running coredns, kube-proxy, calico, etcd, kube-apiserver, kube-controller-manager), see <code>kubectl --namespace=kube-system get pod -o wide</code>. | ||
* Prometheus stats and [https://github.com/kubernetes-sigs/metrics-server metrics-server] are deployed in the metrics namespace during cluster build via <code>kubectl apply -f $yaml-file</code>, just like in the Toolforge deploy documentation. | * Prometheus stats and [https://github.com/kubernetes-sigs/metrics-server metrics-server] are deployed in the metrics namespace during cluster build via <code>kubectl apply -f $yaml-file</code>, just like in the Toolforge deploy documentation. | ||
Line 55: | Line 105: | ||
PAWS is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. Some additional PAWS-specific pods in our deployment are: | PAWS is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. Some additional PAWS-specific pods in our deployment are: | ||
* '''db-proxy''': Mysql-proxy plugin script to perform simple authentication to the [[Help:Toolforge/Database|Wiki Replicas]]. See https://github.com/toolforge/paws/blob/master/images/db-proxy/auth.lua We haven't found a replacement for that yet, but efforts are welcome because the code is quite old. {{PhabT|253134}} | * '''db-proxy''': Mysql-proxy plugin script to perform simple authentication to the [[Help:Toolforge/Database|Wiki Replicas]]. See https://github.com/toolforge/paws/blob/master/images/db-proxy/auth.lua We haven't found a replacement for that yet, but efforts are welcome because the code is quite old. {{PhabT|253134}} | ||
* '''nbserve''' and '''render''': <code>nbserve</code> is an nginx proxy that runs in the cluster at https://public | * '''nbserve''' and '''render''': <code>nbserve</code> is an nginx proxy that runs in the cluster at https://public-paws.wmcloud.org that handles URL rewriting for public URLs to map numerical IDs to Wiki usernames (so we can have URLS like https://public-paws.wmcloud.org/User:BDavis_(WMF)/pip-colorama.ipynb), and <code>render</code> handles the actual rendering of the ipynb notebook as a static page. These images are both essential to how the publishing of PAWS notebooks works. | ||
PAWS also includes customized versions of some Jupyterhub images: | PAWS also includes customized versions of some Jupyterhub images: | ||
Line 65: | Line 115: | ||
=== Deployment === | === Deployment === | ||
* The PAWS repository is at https://github.com/toolforge/paws. It should be cloned locally. Then the [https://github.com/AGWA/git-crypt git-crypt] key needs to be used to unlock secrets.yaml file. See one of the PAWS admins if you should have access to this key. | * The PAWS repository is at https://github.com/toolforge/paws. It should be cloned locally. Then the [https://github.com/AGWA/git-crypt git-crypt] key needs to be used to unlock secrets.yaml file. See one of the PAWS admins if you should have access to this key. | ||
* PAWS | * PAWS is built via github actions triggered by a PR. Github actions will also update the values.yaml to match any new container that is built. | ||
* | * The command used to deploy it right now running cd'd into an unlocked git checkout is: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
helm install paws --namespace prod ./paws -f paws/secrets.yaml | helm install paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m | ||
</syntaxhighlight> | </syntaxhighlight> | ||
If you are deploying to an actual paws cluster, you will also need the ingress controller Pod Security Policy: <code>kubectl apply -f paws/ingress/nginx-ingress-psp.yaml</code> and the controllers themselves <code>kubectl apply -f paws/ingress/nginx-ingress-psp.yaml</code>. Please note, you will need your dedicated ingress worker nodes deployed (prefix puppet looks for the name paws-k8s-ingress-) for that to do anything because there are tolerations and affinities for the nodes. | If you are deploying to an actual paws cluster, you will also need the ingress controller: | ||
{{notice|As of writing we're not actually yet using this set up, if the [https://github.com/toolforge/paws/pull/78 update pull request] has not been merged and deployed yet please follow the [[Special:Permalink/1912147|previous version of these instructions]]}} | |||
<syntaxhighlight lang="bash"> | |||
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx | |||
helm update | |||
kubectl create ns ingress-nginx-gen2 | |||
helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values ingress/values.yaml | |||
</syntaxhighlight> | |||
Pod Security Policy: <code>kubectl apply -f paws/ingress/nginx-ingress-psp.yaml</code> and the controllers themselves <code>kubectl apply -f paws/ingress/nginx-ingress-psp.yaml</code>. Please note, you will need your dedicated ingress worker nodes deployed (prefix puppet looks for the name paws-k8s-ingress-) for that to do anything because there are tolerations and affinities for the nodes. | |||
If already deployed, do not use the "install" command. Change that to "upgrade" to deploy changes/updates, such as: | If already deployed, do not use the "install" command. Change that to "upgrade" to deploy changes/updates, such as: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml | helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Database === | === Database === | ||
JupyterHub uses a database to | JupyterHub uses a database in Trove to manage the user state. Credentials are in secrets.yaml. | ||
==== Moving to sqlite ==== | ==== Moving to sqlite ==== | ||
Line 85: | Line 143: | ||
The smoothest way is to do a helm upgrade as root on a control node (as above, in an [[PAWS/Admin#Deployment_2|unlocked]] checkout) with this command: | The smoothest way is to do a helm upgrade as root on a control node (as above, in an [[PAWS/Admin#Deployment_2|unlocked]] checkout) with this command: | ||
<code>helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml | <code>helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --set=jupyterhub.hub.db.url="sqlite://" --set=jupyterhub.hub.db.type=sqlite</code> | ||
You can roll back to ToolsDB with helm by going into an [[PAWS/Admin#Deployment_2|unlocked]] checkout of https://github.com/toolforge/paws and run helm with <code>helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml</code> | You can roll back to ToolsDB with helm by going into an [[PAWS/Admin#Deployment_2|unlocked]] checkout of https://github.com/toolforge/paws and run helm with <code>helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml</code> | ||
===== Without using helm ===== | ===== Without using helm ===== | ||
Line 98: | Line 156: | ||
To move it back you can set <code> hub.db_url</code> to the previous value (if you didn't write it down before you changed it, see <code>/home/bstorm/src/paws/paws/secrets.yaml</code> at <code>jupyterhub.hub.db.url</code>) and restart the hub with <syntaxhighlight lang=shell-session>$ kubectl --as admin --as-group system:masters -n prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ') </syntaxhighlight> | To move it back you can set <code> hub.db_url</code> to the previous value (if you didn't write it down before you changed it, see <code>/home/bstorm/src/paws/paws/secrets.yaml</code> at <code>jupyterhub.hub.db.url</code>) and restart the hub with <syntaxhighlight lang=shell-session>$ kubectl --as admin --as-group system:masters -n prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ') </syntaxhighlight> | ||
== Common administrative actions == | |||
Some common administrative actions. | |||
=== Deleting user data in case of spam or credential leaks === | |||
In the instance a notebook or file hosted on PAWS needs an admin to remove it immediately (vs. asking a user to delete it), you can access all user data via the NFS mounted locally on all k8s nodes. | |||
* SSH to a worker or control node such as <code>paws-k8s-worker-1.paws.eqiad1.wikimedia.cloud</code>. | |||
* Become root with <code>sudo -i</code> | |||
* <code>cd /data/project/paws/userhomes</code> this is the top level of user homes and paws public pages. | |||
* <code>cd $wiki_user-id</code> where $wiki_user-id is the numeric id of the user, not the text username | |||
* Remove the offending file with rm as needed. | |||
=== Stop a running workload in PAWS === | |||
[[File:Paws-activity.png | right | 500px]] | |||
Useful if you want to stop a crypto miner or similar. | |||
You need to be an admin inside PAWS. | |||
# Log in to PAWS, likely https://hub-paws.wmcloud.org/hub/home | |||
# Click the <code>Admin</code> button in the top menu. If you don't have the button, you aren't an admin | |||
# Search in the list for the workload you want to stop | |||
# Click the <code>Stop server</code> button | |||
Bonus point if you check the user against https://meta.wikimedia.org/wiki/Special:CentralAuth for additional hints to see if the user is a bad actor. | |||
=== Prevent an user from using PAWS === | |||
As of this writing the only method we know about is to talk to a CheckUser in-wiki admin to global-block the user, so it breaks the OAuth that PAWS uses. | |||
TODO: link is probably: https://meta.wikimedia.org/wiki/Meta:Requests_for_help_from_a_sysop_or_bureaucrat |
Latest revision as of 19:53, 14 March 2023
Introduction
PAWS is a Jupyterhub deployment that runs in the PAWS Coud VPS project. The main Jupyterhub login is accessible at https://hub-paws.wmcloud.org/hub/login, and is a public service that can authenticated to via Wikimedia OAuth. More end-user info is at PAWS. Besides a simple Jupyterhub deployment, PAWS also contains easy access methods for the wiki replicas, the wikis themselves via the OAuth grant and pywikibot.
Kubernetes cluster
Deployment
from openstack controller (to make a cluster called 'paws' using the 'paws-k8s21' template):
openstack coe cluster create paws --cluster-template paws-k8s21 --master-count 1 --node-count 3 # get kube config openstack coe cluster config paws --dir /tmp/ cat /tmp/config
from tools bastion (tools-sgebastion-10.tools.eqiad1.wikimedia.cloud):
Put the output into .kube/config, or somewhere else and export KUBECONFIG=<location>
Deploy a new trove db, and add a "paws" database to it. Update the secrets file to have the db login information for this new db. (If the following is failing to deploy, check troubleshooting)
helm upgrade --install ingress-nginx ingress-nginx \ --version v4.4.0 \ --repo https://kubernetes.github.io/ingress-nginx \ --namespace ingress-nginx --create-namespace \ --set controller.service.type=NodePort \ --set controller.service.enableHttps=false \ --set controller.service.nodePorts.http=30001 \ --set-string controller.config.proxy-body-size="4m" # T328168 git clone https://github.com/toolforge/paws.git # decrypt if necessary git checkout <checkout the updated secrets file for the new db> kubectl config set-context --current --namespace=prod helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ helm dep up paws/ kubectl create namespace prod helm install paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m kubectl apply -f manifests/psp.yaml
Update Web Proxies in Horizon. DNS > Web Proxies Point hub-paws and public-paws to the first node of the new cluster
Troubleshooting
Magnum relies on some containers from dockerhub. Dockerhub will limit after 100 anonymous pulls in a six hour window. If your containers are not deploying complaining of taint problems, check kube-system containers:
kubectl get all -n kube-system
If containers are crash looping, you will likely have to add a docker credential to them (You can check using kubectl describe <pod name>)
docker login
create secret generic regcred --from-file=.dockerconfigjson=<path to your docker/config.json> --type=kubernetes.io/dockerconfigjson -n kube-system
Using the following edit commands add:
imagePullSecrets:
- name: regcred
under spec.template.spec
kubectl edit -n kube-system daemonset.apps/openstack-cloud-controller-manager
kubectl edit -n kube-system deployment.apps/kubernetes-dashboard
kubectl edit -n kube-system deployment.apps/dashboard-metrics-scraper
kubectl edit -n kube-system daemonset.apps/k8s-keystone-auth
Upgrading
Upgrading of the cluster should be preformed the same as the deployment of the cluster. Just using a new magnum template, that defines the newer k8s version. Thus upgrading is the same as disaster recovery.
Architecture
The core of paws is run on openstack magnum. Thus k8saas. In concept it should be able to be runable on any k8s, so long as it has access to a db and nfs.
Floating IP
The floating IP is our second service using a manually-provisioned Neutron port with IP 172.16.1.171/32 that is managed with keepalived, using this procedure: Portal:Cloud VPS/Admin/Keepalived That is is NAT'd to public IP 185.15.56.57/32.
Ports
At the load balancer layer (haproxy), routing is done by port back to the Kubernetes worker nodes. The ingress layer is served at the well-known web ports (TCP 80 and 443), which hits the worker nodes on a Nodeport service at port 30001. The neutron security group paws-loadbalancer prevents internet clients from contacting the k8s API at this time.
TLS
TLS certs are done via acme-chief and distributed to the haproxy load balancer layer. Therefore inside the cluster, Kubernetes basically has the TLS ingress bits in helm turned off.
Helm
Helm 3 is used to deploy kubernetes applications on the cluster. It is installed by puppet via Debian package. The community supported ingress-nginx controller is deployed from its own helm chart, but the ingress objects are all managed in the PAWS helm chart. As this is helm 3, there is no tiller and RBAC affects what you can do.
Add a worker
from openstack controller:
openstack coe cluster resize <cluster name> <size you want it to be>
ex:
openstack coe cluster resize paws-dev 4
General notes
- The haproxy nodes are part of separate anti-affinity server groups so that Openstack will not schedule them on the same hypervisor.
- To see status of k8s control plane pods (running coredns, kube-proxy, calico, etcd, kube-apiserver, kube-controller-manager), see
kubectl --namespace=kube-system get pod -o wide
. - Prometheus stats and metrics-server are deployed in the metrics namespace during cluster build via
kubectl apply -f $yaml-file
, just like in the Toolforge deploy documentation. - Because of pod security policies in place, all init containers have been removed from the paws-project version of things. Privileged containers cannot be run inside the
prod
namespace.
Jupyterhub deployment
Jupyterhub & PAWS Components
Jupyterhub is a set of systems deployed together that provide Jupyter notebook servers per user. The three main subsystems for Jupyterhub are the Hub, Proxy, and the Single-User Notebook Server. Really good overview of these systems is available at http://jupyterhub.readthedocs.io/en/latest/reference/technical-overview.html.
PAWS is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. Some additional PAWS-specific pods in our deployment are:
- db-proxy: Mysql-proxy plugin script to perform simple authentication to the Wiki Replicas. See https://github.com/toolforge/paws/blob/master/images/db-proxy/auth.lua We haven't found a replacement for that yet, but efforts are welcome because the code is quite old. task T253134
- nbserve and render:
nbserve
is an nginx proxy that runs in the cluster at https://public-paws.wmcloud.org that handles URL rewriting for public URLs to map numerical IDs to Wiki usernames (so we can have URLS like https://public-paws.wmcloud.org/User:BDavis_(WMF)/pip-colorama.ipynb), andrender
handles the actual rendering of the ipynb notebook as a static page. These images are both essential to how the publishing of PAWS notebooks works.
PAWS also includes customized versions of some Jupyterhub images:
- singleuser: Since this is the environment for end users, there is a fair bit going on here. Our image is a replacement of the upstream one. We set the correct UID and directory. We install the jupyterhub/lab code directly from pip, along with PyWikiBot, a small library to allow importing a notebook like a python package along the lines of
import paws.$username.$notebooks_name
called ipynb-paws and code from https://github.com/toolforge/nbpawspublic to add a public link button. There are other customizations because this is a great surface for doing them. The general goal is to get a notebook up and running for use on wikis as fast as possible. - paws-hub: We build upon the upstream Jupyterhub hub image just a touch, adding bits that respect more of the UID settings and adding in a custom culling script. The code for doing OAuth is actually inserted in the helm chart instead.
The other custom image is a deploy-hook, which is undergoing some renovations before it is redeployed in the cluster.
Deployment
- The PAWS repository is at https://github.com/toolforge/paws. It should be cloned locally. Then the git-crypt key needs to be used to unlock secrets.yaml file. See one of the PAWS admins if you should have access to this key.
- PAWS is built via github actions triggered by a PR. Github actions will also update the values.yaml to match any new container that is built.
- The command used to deploy it right now running cd'd into an unlocked git checkout is:
helm install paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m
If you are deploying to an actual paws cluster, you will also need the ingress controller:
![]() | As of writing we're not actually yet using this set up, if the update pull request has not been merged and deployed yet please follow the previous version of these instructions |
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm update
kubectl create ns ingress-nginx-gen2
helm install -n ingress-nginx-gen2 ingress-nginx-gen2 ingress-nginx/ingress-nginx --values ingress/values.yaml
Pod Security Policy: kubectl apply -f paws/ingress/nginx-ingress-psp.yaml
and the controllers themselves kubectl apply -f paws/ingress/nginx-ingress-psp.yaml
. Please note, you will need your dedicated ingress worker nodes deployed (prefix puppet looks for the name paws-k8s-ingress-) for that to do anything because there are tolerations and affinities for the nodes.
If already deployed, do not use the "install" command. Change that to "upgrade" to deploy changes/updates, such as:
helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --timeout=50m
Database
JupyterHub uses a database in Trove to manage the user state. Credentials are in secrets.yaml.
Moving to sqlite
During ToolsDB outages we can change the db to in memory sqlite without significant impact.
The smoothest way is to do a helm upgrade as root on a control node (as above, in an unlocked checkout) with this command:
helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml --set=jupyterhub.hub.db.url="sqlite://" --set=jupyterhub.hub.db.type=sqlite
You can roll back to ToolsDB with helm by going into an unlocked checkout of https://github.com/toolforge/paws and run helm with helm upgrade paws --namespace prod ./paws -f paws/secrets.yaml -f paws/production.yaml
Without using helm
If you don't have an unlocked checkout and you are using your user account on a shell on one of the k8s control plane hosts, you can also manually edit the configmap to do this:
$ kubectl --as admin --as-group system:masters --namespace prod edit configmap hub-config
write down the existing value and then set hub.db_url
to "sqlite://"
Restart the hub with
$ kubectl --as admin --as-group system:masters -n prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')
To move it back you can set hub.db_url
to the previous value (if you didn't write it down before you changed it, see /home/bstorm/src/paws/paws/secrets.yaml
at jupyterhub.hub.db.url
) and restart the hub with
$ kubectl --as admin --as-group system:masters -n prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')
Common administrative actions
Some common administrative actions.
Deleting user data in case of spam or credential leaks
In the instance a notebook or file hosted on PAWS needs an admin to remove it immediately (vs. asking a user to delete it), you can access all user data via the NFS mounted locally on all k8s nodes.
- SSH to a worker or control node such as
paws-k8s-worker-1.paws.eqiad1.wikimedia.cloud
. - Become root with
sudo -i
cd /data/project/paws/userhomes
this is the top level of user homes and paws public pages.cd $wiki_user-id
where $wiki_user-id is the numeric id of the user, not the text username- Remove the offending file with rm as needed.
Stop a running workload in PAWS
Useful if you want to stop a crypto miner or similar.
You need to be an admin inside PAWS.
- Log in to PAWS, likely https://hub-paws.wmcloud.org/hub/home
- Click the
Admin
button in the top menu. If you don't have the button, you aren't an admin - Search in the list for the workload you want to stop
- Click the
Stop server
button
Bonus point if you check the user against https://meta.wikimedia.org/wiki/Special:CentralAuth for additional hints to see if the user is a bad actor.
Prevent an user from using PAWS
As of this writing the only method we know about is to talk to a CheckUser in-wiki admin to global-block the user, so it breaks the OAuth that PAWS uses.
TODO: link is probably: https://meta.wikimedia.org/wiki/Meta:Requests_for_help_from_a_sysop_or_bureaucrat