You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Paws/SAL

From Wikitech-static
< Nova Resource:Paws
Revision as of 12:55, 18 May 2022 by imported>Stashbot (Rook: bump jupyterhub version #149 T308568 41f03a544041318f1fad479b32ae46ac9e816a55)
Jump to navigation Jump to search

2022-05-18

2022-05-16

  • 09:36 dcaro: restarted reload-acme-chief-backend.service to ensure certs are refreshed

2022-05-14

  • 16:16 andrewbogott: restarting acme-chief.service on paws-acme-chief-01 for T308383

2022-05-11

  • 13:44 Rook: update pwb version and pin jupyterlab version ef3e38c

2022-05-10

  • 13:56 Rook: upgrade pywikibot on container start 437f46a

2022-04-27

  • 17:02 Rook: pywikibot version bump 3c42a62

2022-04-18

  • 17:29 Rook: updating links to phab with prefilled ticket links aef7c67
  • 12:30 Rook: update pywikibot 6db74b6

2022-04-16

2022-04-04

  • 12:31 taavi: moving all VMs from paws-puppetmaster-01 -> paws-puppetmaster-2

2022-03-29

  • 10:36 Rook: upgrading pywikibot 702f21d

2022-03-21

  • 11:11 Rook: deploying jupyterlab cd6ee19

2022-03-10

  • 12:23 Rook: updating banner to note ui will update soon 462ab18

2022-03-08

  • 13:26 Rook: upgrading open refine c116d64

2022-03-07

  • 11:12 Rook: deploying paws realtime collaboration 246e2af

2022-03-02

  • 14:20 Rook: deploying fixed version of jupyter-rsession-proxy abe89f6

2022-03-01

  • 13:38 Rook: deploying pyaudio fix 978fb64

2022-02-23

  • 13:13 Rook: deploying e6eedbc cleanup

2022-02-15

  • 18:20 chicocvenancio: added psp for minesweeper
  • 16:04 mdipietro: updating pywikibot 2fc27c9
  • 14:21 chicocvenancio: Deploying minesweeper

2022-01-25

  • 14:30 mdipietro: deployed 93d33c4 PR122

2021-12-28

2021-12-20

  • 18:20 majavah: deploying calico v3.21.0 (T292698)

2021-12-16

2021-12-02

  • 19:12 chicocvenancio: deploy PR 111 T295257
  • 12:52 mdipietro: upgrading pywikibot 0f5d28d

2021-12-01

  • 11:36 mdipietro: deploying lsof pr-76 a378845

2021-11-29

  • 13:02 chicocvenancio: deploy PR 113 T295761
  • 12:29 chicocvenancio: deploy PR 112 T295761

2021-11-25

  • 21:37 chicocvenancio: rollback singleuser to PR #96 T295257
  • 21:15 chicocvenancio: deploy PR #110 changing singleuser to bump openrefine version T295257

2021-11-23

  • 14:19 mdipietro: increased cull timeout with deploy of 3e57264

2021-11-22

  • 12:57 mdipietro: added julia to paws with 7b58fb0
  • 11:04 mdipietro: added julia to paws with 12bfdad

2021-11-11

  • 08:35 majavah: disabling pod preset controller in preparation for T291913

2021-11-09

  • 16:24 mdipietro: deployed PR97 (85c085f) Update Pywikibot to 6.6.2

2021-11-03

2021-11-01

  • 12:31 majavah: upgrade ingress-nginx T292771

2021-10-28

  • 14:35 chicocvenancio: set team toolforge/wmcsadmins as maintainers for github repo

2021-10-26

  • 15:06 chicocvenancio: delete orphan pods for 2 users

2021-10-22

2021-10-21

  • 12:58 mdipietro: upgraded to 923250f which was really not an upgrade as the diff gave nothing. Though now it is clear what is deployed.

2021-09-07

  • 22:14 bstorm: upgraded k8s to 1.19.13 T287399

2021-08-18

  • 19:09 bstorm: redeployed hub with trove database backend instead of toolsdb

2021-07-29

  • 14:09 majavah: add mdipietro as projectadmin T287287

2021-07-25

  • 16:09 majavah: deleting ingress pod running on worker-6 to get it to re-appear in ingress-4

2021-07-21

  • 19:53 bstorm: deployed new maintain-kubeusers T285011
  • 19:53 bstorm: deployed new rbac for maintain-kubeusers changes T285011
  • 16:59 majavah: deploying calico v3.18.4 T280342
  • 15:52 majavah: add my key to passwords::root::extra_keys
  • 15:00 majavah: starting kubernetes upgrades T280302

2021-07-14

  • 10:38 majavah: correction: undeploy old ingress T264221
  • 10:35 majavah: undeploy old ingress T266050

2021-07-13

  • 07:51 majavah: renewing tools-prometheus certificates

2021-07-12

  • 13:18 majavah: ingress upgrade completed
  • 13:05 majavah: moving user traffic to updated ingress-nginx T264221

2021-07-01

  • 12:04 majavah: deploy ingress-nginx 0.46 via the helm chart to paws T264221

2021-06-30

  • 20:05 bstorm: tried force delete on the ingress-nginx-gen2 namespace, which doesn't appear to be working either until metrics-server is fixed T285905
  • 20:00 bstorm: renewed k8s metrics-server certs and the deployment
  • 18:04 majavah: renew kubernetes metrics-server certificate
  • 17:26 majavah: creating paws-k8s-ingress-[3-4] and joining them to the k8s cluster T264221
  • 17:16 bstorm: temporarily increased quota to 60 cores to enable T264221

2021-06-03

  • 20:43 chicocvenancio: tagged new singleuser image, fixes T283969

2021-05-27

  • 21:53 bstorm: added paws-k8s-control-2.paws.eqiad.wmflabs back to the list of control nodes at the proxy
  • 21:50 bstorm: renewed the certs for paws-k8s-control-2
  • 20:37 bstorm: removed paws-k8s-control-2.paws.eqiad.wmflabs from the proxy because it is somewhat broken (certs expired)
  • 19:41 bstorm: forced removal of openrefine in paws for now and deleted all current user server pods to force use of the new image

2021-05-23

2021-05-21

  • 00:06 bstorm: creating trove mysql instance pawsdb-1 T267683

2021-05-12

  • 19:33 bstorm: added taavi to paws.admin

2021-05-11

  • 09:17 Majavah: set `profile::wmcs::kubeadm::docker_vol: false` on ingress nodes T282087
  • 09:15 arturo: added user `taavi` (Majavah) as projectadmin

2021-04-20

2021-04-02

  • 21:50 bstorm: deploying latest PRs to add a note on the wikireplicas changes

2020-12-21

  • 20:27 bstorm: applied tuning for timeouts and elections on the k8s etcd pods of 300 for heartbeat and 3000 for elections T267966

2020-12-17

  • 02:22 bstorm: Set PAWS hub back to using mariadb T266587

2020-12-16

  • 18:21 chicocvenancio: move paws to sqlite while toolsdb is down.

2020-12-10

  • 17:00 arturo: fixing /etc/kubernetes/kublet.conf and restarting kubelet in paws-k8s-control-1 (T269865)

2020-12-05

  • 00:42 bd808: `kubectl delete po renderer-794886b9cd-9nc6c -n prod` after seeing lots of listen queue full errors in the pod logs.

2020-11-30

  • 18:22 bstorm: 1.17 upgrade for kubernetes complete T268669
  • 17:25 bstorm: upgrading the worker nodes (this will likely kill services briefly when some pods are rescheduled) T268669
  • 17:14 bstorm: updated the calico-kube-controllers deployment to use our internal registry to deal with docker-hub rate-limiting T268669 T269016
  • 17:09 chicocvenancio: delete orphaned jupyter server pod `kubectl -n prod delete pod jupyter--45volutionoftheuniverse`. Respective server not running in jupyter admin UI.
  • 16:31 bstorm: upgrading pods on paws-k8s-control-3 T268669
  • 16:17 bstorm: starting upgrade on paws-k8s-control-2 T268669 (first kubectl drain paws-k8s-control-2 --ignore-daemonsets)
  • 15:53 bstorm: proceeding with upgrade to 1.17 on paws-k8s-control-1 T268669
  • 15:49 bstorm: draining paws-k8s-control-1 for upgrade T268669
  • 12:49 arturo: disable puppet in all k8s nodes to prepare for the upgrade (T268669)
  • 12:49 arturo: set hiera `profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-1-17'` at project level (T268669)

2020-11-16

  • 22:13 bstorm: deploying new paws changes for multiinstance readiness

2020-11-10

  • 20:16 chicocvenancio: restart hub to apply move to sqlite. T267667
  • 16:41 arturo: set paws in sqlite mode because T266587 (kubectl --namespace prod edit configmap hub-config)

2020-10-15

  • 19:12 andrewbogott: uncordoned paws-k8s-worker-1 and -2
  • 18:48 andrewbogott: draining paws-k8s-worker-2 for move to ceph
  • 18:36 andrewbogott: draining paws-k8s-worker-1 for move to ceph

2020-09-29

  • 10:59 arturo: last 2 commands should help puppet agent in the paws project, previously it had issues fetching acme-chief certs because an API update
  • 10:58 arturo: aborrero@paws-acme-chief-01:~$ sudo systemctl restart uwsgi-acme-chief.service
  • 10:56 arturo: aborrero@paws-acme-chief-01:~$ sudo systemctl restart acme-chief.service

2020-08-14

  • 17:09 bstorm: backing up the old proxy config to NFS and deleting paws-proxy-02 T211096

2020-08-07

  • 22:30 bstorm: removing downtime for paws and front page monitor T211096
  • 18:01 bstorm: shutting down paws-proxy-02 T211096
  • 17:05 bstorm: running the final rsync to the new cluster's nfs T211096
  • 16:08 bstorm: changing paws.wmflabs.org to point at the new cluster ip 185.15.56.57 T211096
  • 16:02 bstorm: LAST MESSAGE WRONG: switching NEW cluster to toolsdb T211096
  • 16:02 bstorm: switching old cluster to toolsdb T211096
  • 15:58 bstorm: switching old cluster to sqlite T211096
  • 15:53 bstorm: downtiming alerts in case they need changes (seems likely) T211096

2020-07-30

  • 20:40 bstorm: upgrading the singleuser image to test shuffling around some of the pip installs
  • 16:38 bstorm: removing the *.paws.wmflabs.org SNI name because it won't be used and it might trigger a re-issue of certs T255249
  • 15:39 bstorm: upgrading acme-chief to 0.27-1

2020-07-29

  • 18:03 bstorm: powering on paws-k8s-haproxy-1 because that worked fine
  • 18:00 bstorm: powering off paws-k8s-haproxy-1 to test failover

2020-07-24

  • 17:25 bstorm: to force repulling of every image everywhere, uninstalling paws in the new cluster and reinstalling it T258812
  • 09:39 arturo: dropped the DNS wildcard record `*.paws.wmcloud.org IN A 185.15.56.57` and created concrete CNAME records for the FQDNs we actually use (T211096)

2020-07-23

  • 22:51 bstorm: deploying via the default 'latest' tag in the new cluster T211096
  • 22:48 bstorm: tagged the newbuild tags with "latest" to set sane defaults for all images in the helm chart T211096
  • 21:14 bstorm: pushing quay.io/wikimedia-paws-prod/nbserve:newbuild to main repo T211096
  • 21:11 bstorm: pushing quay.io/wikimedia-paws-prod/deploy-hook:newbuild to main repo T211096
  • 21:09 bstorm: pushing quay.io/wikimedia-paws-prod/singleuser:newbuild to the main repo T211096
  • 21:08 bstorm: pushing quay.io/wikimedia-paws-prod/paws-hub:newbuild to the main repo T211096
  • 21:06 bstorm: pushing dbproxy docker image for new cluster into main quay.io repo T211096

2020-07-22

  • 23:32 bstorm: setting the default NFS version to 4.2 while excepting the two stretch servers T257945

2020-07-21

  • 15:13 chicocvenancio: merge pr #50 to fix T258142

2020-07-06

  • 21:41 bstorm: deployed ingress to redirect paws.wmcloud.org to the wikitech doc page T195217

2020-06-30

  • 23:00 bstorm: added paws-public.wmflabs.org to the alt-names for acme-chief, which broke it until we hand off the zone to the paws project <sorry!> T195217 T255997

2020-06-26

  • 21:57 bstorm: applied the metrics manifests to kubernetes to enable metrics-server, cadvisor, etc. T256361

2020-06-25

  • 22:52 bstorm: created paws-k8s-worker-5/6/7 as x-large nodes to bring the cluster up to roughly the same capacity as the existing one using soft anti-affinity T211096 T253267
  • 22:43 bstorm: bumped quota up to 24 instances, 128 GB RAM and 56 cores T211096
  • 16:39 bstorm: deleted the deployhook from the in-progress new cluster for now just in case T211096
  • 15:44 bstorm: deployed a proof-of-concept paws-public setup in the new cluster T255997

2020-06-24

  • 23:18 bstorm: added A record for *.paws.wmcloud.org to public and hub to use T211096 T255997 T195217
  • 21:45 bstorm: doing an initial rsync of the paws userhomes to the new project T160113

2020-06-19

  • 10:01 arturo: enabled `paws.wmflabs.org` and `*.paws.wmflabs.org` as valid ingress domains (acme-chief TLS cert, haproxy, etc) (T195217)

2020-06-17

  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits T251298
  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits

2020-06-16

  • 15:48 arturo: change DNS record k8s.svc.paws.eqiad1.wikimedia.cloud to point to the haproxy VIP port address 172.16.1.171 (T195217)
  • 15:47 arturo: associate floating IP 185.15.56.57 with haproxy VIP port (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 1b40be58-7182-41aa-95ce-797f94f83d66` (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 9ccc43d9-1a8a-4287-afda-67e8bab27a9f` (T295217)
  • 15:37 arturo: `aborrero@cloudcontrol1004:~ 1 $ sudo wmcs-openstack --os-project-id=paws port create --network 7425e328-560c-4f00-8e99-706f3fb90bb4 paws-haproxy-vip` (T295217)
  • 15:23 arturo: live-hacking paws-puppetmaster-01 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/605944 for T195217

2020-06-15

  • 15:59 arturo: created DNS record `deploy-hook.paws.wmcloud.org IN CNAME paws.wmcloud.org` (T195217)
  • 12:28 arturo: manually created an Ingress object to test routing to the hub (T195217)
  • 12:20 arturo: created DNS record `paws.wmcloud.org IN A 185.15.56.57` (T195217)
  • 12:19 arturo: associate floating IP 185.15.56.57 with VM paws-k8s-haproxy-1 (T195217)
  • 12:18 arturo: release floating IP not in use: 185.15.56.42
  • 12:18 arturo: release floating IP not in use: 185.15.56.43
  • 11:45 arturo: reset wikitech user password for the service account `paws-dns-manager` to what is in labs/private.git/hieradata/common.yaml `profile::acme_chief::cloud::designate_sync_password` (T195217)

2020-06-12

  • 18:49 bstorm_: deployed a test of paws chart in the new cluster T211096
  • 13:23 arturo: assigned the DNS zone `paws.wmcloud.org` (T195217)
  • 13:13 arturo: live-hacking session in the puppetmaster ended
  • 13:05 arturo: live-hacking puppet tree in paws-puppetmaster-01 for T195217
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws observer` (T255252)
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws designateadmin` (T255252)
  • 11:51 arturo: created service account `paws-dns-manager` in wikitech (T255252)
  • 11:31 arturo: introduced acme-chief private data into labs/private in paws-puppetmaster-01 (T255252)
  • 11:02 arturo: created puppet prefix 'paws-acme-chief' (T255252)
  • 11:01 arturo: created VM paws-acme-chief-01 (T255252)

2020-06-11

2020-06-04

  • 14:16 arturo: added node taints to ingress nodes: `kubectl taint nodes paws-k8s-ingress-1 ingress=true:NoSchedule` (T195217)
  • 12:18 arturo: bootstrapped paws-k8s-ingress nodes, added them to the k8s cluster (T195217)
  • 12:04 arturo: created `paws-k8s-ingress` puppet prefix and add the `role::wmcs::paws::k8s::worker` role (T195217)
  • 12:02 arturo: created 2 medium VM instances: paws-k8s-ingress-1 and paws-k8s-ingress-2 with haproxy anti-affinity (T195217)

2020-05-26

  • 22:34 bstorm_: restored the deployment for maintain-kubeusers so anyone added to the paws.admin group will have admin on the cluster now that the bug is fixed T211096 T246059
  • 22:05 bstorm_: temporarily deleted the deployment for maintain-kubeusers pending patch to fix context creation for new admin accounts T211096 T246059
  • 22:04 bstorm_: created paws-focused PodSecurityPolicies and the prod namespace in the new cluster T211096
  • 22:03 bstorm_: created paws.admin group and kubernetes admin accounts on the new k8s cluster T211096 T246059
  • 18:29 bstorm_: bootstrapped the new control plane nodes T211096
  • 15:27 bstorm_: updated profile::wmcs::kubeadm::kubernetes_version to 1.16.10 for cluster init T211096

2020-05-21

  • 23:04 bstorm_: added profile::wmcs::kubeadm::k8s::encryption_key and profile::wmcs::kubeadm::k8s::node_token to labs/private T211096
  • 14:53 bstorm_: adding the hiera values to horizon for bootstrapping k8s T211096
  • 14:39 arturo: point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096)
  • 12:48 arturo: created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096)
  • 12:34 arturo: created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096)

2020-05-20

  • 22:35 bstorm_: created paws-k8s-worker-1/2/3/4 T211096
  • 22:12 bstorm_: created paws-k8s-haproxy-1/2 with antiaffinity group T211096
  • 21:36 bstorm_: created paws-k8s-control-1/2/3 with appropriate sec group and server group T211096
  • 18:59 bstorm_: created anti-affinity group "controlplane" T211096
  • 16:38 bstorm_: deleting the old shut-down VMs from the last effort to rebuild paws T211096
  • 16:36 bstorm_: cleaned up the old DNS entries for the external LBs that have been off for a year

2020-03-20

  • 14:03 jeh: upgrade paws-puppetmaster-01 to v5 T241719

2020-02-14

  • 21:31 andrewbogott: restarting paws-puppetmaster-01 so its clients can connect

2020-01-09

  • 18:06 bstorm_: rebooting tools-paws-master-01 T242353
  • 14:28 chicocvenancio: shutdown unused instances

2019-12-13

  • 00:27 bstorm_: rebooting the paws master since it is in a bad state after the openstack maintenance as well.

2019-11-01

  • 21:15 Krenair: Updated paws-apiserver.wmflabs.org A record list to remove 172.16.2.151 which is not allocated to any instance. The other two A records point to valid instances in the paws project.

2019-10-23

  • 09:03 arturo: paws-master-01/03 and a couple of other servers are down because hypervisor is rebooting

2019-10-14

  • 22:32 bd808: Removed project member "Afrodric". Looks like someone added accidentally when trying to make aborrero as project member
  • 22:31 bd808: Added Krenair as project member

2019-05-18

  • 11:13 chicocvenancio: point paws-proxy-02 to tools-paws-worker-1006 on paws-deploy-hook hostname (T218380)

2019-04-26

2019-04-16

  • 17:15 chicocvenancio: move paws-proxy-02 reload nginx
  • 17:07 chicocvenancio: move paws-proxy-02 to point to tools-paws-worker-1006 for upcoming master move

2019-03-27

  • 23:46 chicocvenancio: moving paws host in `paws-proxy-02` back to `tools-paws-master-01` T219460
  • 22:10 chicocvenancio: moving paws host in `paws-proxy-02` to `tools-paws-worker-1005` T219460

2019-03-25

  • 14:12 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)
  • 14:07 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project T211096
  • 13:54 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)

2019-03-15

  • 02:25 gtirloni: activated TLS termination using Let's Encrypt on paws-proxy-02
  • 02:25 gtirloni: removed webproxies and created new A records pointing directly to paws-proxy-02

2019-02-21

  • 09:22 gtirloni: upgraded and rebooted paws-proxy-02

2019-02-20

  • 15:00 andrewbogott: deleting the long-shut-down paws-proxy-01

2019-02-15

  • 01:28 bd808: Re-enabled PAWS vhost on paws-proxy-02

2019-02-14

  • 22:25 gtirloni: downtimed PAWS in Icinga
  • 22:16 gtirloni: Activated maintenance page on paws-proxy-02 nginx config

2019-02-13

  • 08:32 arturo: switch paws-proxy-02 puppetmaster to labs-puppetmaster.wikimedia.org

2019-01-24

  • 19:20 andrewbogott: shutting down paws-proxy-01
  • 19:11 chicocvenancio: moved config, ready to receive traffic on paws-proxy-02 T214613
  • 18:34 chicocvenancio: firing up paws-proxy-02 for T214613

2019-01-23

2018-10-25

  • 23:58 gtirloni: Started tools-paws-worker-1010 (T208006)

2018-08-03

  • 20:19 andrewbogott: deleting paws-master-01 and paws-node-1002; unused

2018-07-03

  • 22:49 bstorm_: added stricter image space reclaiming arguments to kubelet

2018-06-20

  • 17:39 chicocvenancio: edited paws-proxy-01 to pass http_x_forwarded_proto as it receives T197248

2018-05-04

  • 02:48 chicocvenancio: killed 25 pods with more than one hour inactivity through admin interface

2018-03-14

  • 21:49 chicocvenancio: updated k8s control plane, updating nodes to v1.9.4 for T189680

2018-02-23

  • 18:33 chicocvenancio: redirected tools.wmflabs.org/paws to paws.wmflabs.org and deleted old k8s ReplicationControllers (T188068)

2018-02-22

  • 22:11 chicocvenancio: (T175202) culler is running and killing pods as designed!
  • 21:13 chicocvenancio: jupyterhub updated to fix culler (T175202) culler already ran without 404
  • 17:43 chicocvenancio: manually ran culler inside hub container

2018-02-21

  • 17:03 chicocvenancio: deleted query-killer k8s deployment T187818

2018-02-16

  • 20:18 chicocvenancio: changed userhomes group for T185434 workarround

2018-02-15

  • 01:10 chicocvenancio: changed group of all userhome folders to tools.paws

2018-02-04

  • 12:21 chicocvenancio: changed group of all userhome folders to tools.paws

2017-12-19

  • 22:11 bd808: Killed tiller pod that was in crashloopbackoff

2017-09-28

  • 21:25 andrewbogott: server docker restart on paws-node-1002; disk is full and docker is holding open a lot of deleted files

2017-03-20

  • 21:25 andrewbogott: migrating paws-base-01 to labvirt1013

2016-05-10