Difference between revisions of "Nova Resource:Toolsbeta/SAL"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dcaro: Creating new toolsbeta-test-k8s-control-4 node and adding it to the cluster (T267140))
imported>Stashbot
(majavah: upload toolforge-webservice 0.78 to stretch,buster,bullsye-toolsbeta repositories)
(91 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== 2021-10-20 ===
* 12:15 majavah: upload toolforge-webservice 0.78 to stretch,buster,bullsye-toolsbeta repositories
=== 2021-10-16 ===
* 07:47 majavah: deployed cert-manager and wave as a test for automating [[phab:T292238|T292238]]
=== 2021-10-14 ===
* 15:02 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus
* 15:01 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus
* 15:00 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus
=== 2021-10-13 ===
* 11:18 wm-bot: Added a new grid webgrid generic node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the pool ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:19 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:19 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
=== 2021-10-12 ===
* 16:10 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 14:52 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 14:46 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 07:05 majavah: start gridengine-master.service on toolsbeta-sgegrid-master
=== 2021-10-11 ===
* 15:24 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 15:00 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:32 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
=== 2021-10-07 ===
* 14:21 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 14:06 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 13:31 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 12:55 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 12:50 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 12:50 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 08:04 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 07:58 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
=== 2021-10-06 ===
* 10:36 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:13 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:08 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:07 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
* 10:05 wm-bot: Adding a new grid webgrid generic node ([[phab:T292465|T292465]]) - cookbook ran by dcaro@vulcanus
=== 2021-10-04 ===
* 17:07 bstorm: reboot everything [[phab:T291406|T291406]]
* 17:06 bstorm: use cumin to edit fstab to remove old nfs mounts [[phab:T291406|T291406]]
* 16:41 bstorm: setting mount_nfs: true on toolsbeta-mail prefix (which is the correct setting)
* 14:45 dcaro: rebooting toolsbeta-sgewebgrid-generic-0901.toolsbeta.eqiad1.wikimedia.cloud to force a fsck of the dm-0 device on boot ([[phab:T290970|T290970]])
=== 2021-10-01 ===
* 12:34 arturo: rebooting toolsbeta-sgebastion-04 ([[phab:T292289|T292289]])
* 12:12 arturo: experimenting with newer mono runtime on toolsbeta-sgebastion-04 ([[phab:T292289|T292289]])
=== 2021-09-29 ===
* 22:13 bstorm: ran label fix script to use new label format
* 22:12 bstorm: toollabs-webservice 0.77 deployed
=== 2021-09-28 ===
* 10:32 majavah: removing all podpreset objects and disabling settings.k8s.io/v1alpha1 api
=== 2021-09-27 ===
* 16:13 majavah: testing volume-admission fix for containers with some volumes mounted
=== 2021-09-23 ===
* 17:14 majavah: testing new maintain-kubeusers release [[phab:T279106|T279106]]
=== 2021-09-22 ===
* 18:07 bstorm: launching toolsbeta-nfs-test-client-01 to run a "fair" test battery against [[phab:T291406|T291406]]
=== 2021-09-15 ===
* 08:04 majavah: tools-manifest 0.24, [[phab:T290325|T290325]]
=== 2021-09-14 ===
* 15:45 majavah: disable podpreset admission plugin in toolsbeta [[phab:T279106|T279106]]
* 11:42 arturo: deploying jobs-framework-emailer {{Gerrit|3045601}} ([[phab:T286135|T286135]])
* 10:44 arturo: deploying jobs-framework-emailer {{Gerrit|51032af}} ([[phab:T286135|T286135]])
* 10:39 arturo: deploying jobs-framework-api {{Gerrit|16fbf51}} ([[phab:T286135|T286135]])
=== 2021-09-13 ===
* 15:44 majavah: deploy volume-admission-controller in background; [[phab:T279106|T279106]]
=== 2021-09-09 ===
* 17:36 bstorm: deploying a base tekton triggers setup [[phab:T267374|T267374]]
* 16:50 majavah: enable unattended updates on toolsbeta [[phab:T290494|T290494]]
* 16:19 arturo: {{Gerrit|70017ec0ac}} root@toolsbeta-test-k8s-control-4:~# kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml
* 00:26 bstorm: deleted toolsbeta-sgeexec-0902 since it had a badly screwed up /tmp
=== 2021-09-03 ===
* 22:34 bstorm: backfilled quotas for [[phab:T286784|T286784]]
=== 2021-08-30 ===
* 23:23 bstorm: deleting toolsbeta-workflow-test [[phab:T289709|T289709]]
=== 2021-08-21 ===
* 00:17 bstorm: rebooting the control plane nodes for kubernetes because it can't make things worse [[phab:T289390|T289390]]
=== 2021-08-20 ===
* 23:19 bstorm: tried renewing all the certs to get certs working again in kubernetes
=== 2021-08-12 ===
* 16:55 bstorm: deployed updated manifest for ingress-admission
* 15:02 majavah: deploying ingress-admission-controller using v1 api [[phab:T280436|T280436]]
=== 2021-07-30 ===
* 08:01 majavah: replace toolsbeta-sgeexec-1002 with -1004 for [[phab:T287666|T287666]]
=== 2021-07-29 ===
* 14:08 majavah: add mdipietro as projectadmin [[phab:T287287|T287287]]
* 13:06 majavah: rebuild toolsbeta-sgeexec-1001 as -1003 [[phab:T287666|T287666]]
=== 2021-07-23 ===
* 13:31 majavah: upgrading toolsbeta to kubernetes 1.19, [[phab:T280340|T280340]]
=== 2021-07-22 ===
* 15:32 arturo: re-deploying toolforge-jobs-framework-api
=== 2021-07-21 ===
* 11:58 arturo: deploying jobs-framework-api {{Gerrit|07346d715d17585db9c16dd152cc91ef0bea33c3}} ([[phab:T286108|T286108]])
* 10:51 arturo: enabling TTLAfterFinished feature gate on static pod manifests on /etc/kubernetes/manifests/kube-<nowiki>{</nowiki>apiserver,controller-manager<nowiki>}</nowiki>.yaml in all 3 control nodes ([[phab:T286108|T286108]])
* 10:47 arturo: enabling TTLAfterFinished feature gate on kubeadm live configmap ([[phab:T286108|T286108]])
* 10:09 arturo: livehacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/705848
=== 2021-07-20 ===
* 21:18 bstorm: applied `login_server: true` to toolsbeta-sgecron-01 [[phab:T287037|T287037]]
* 19:09 bstorm: upgraded version of maintain-kubeusers to the latest in master branch [[phab:T285011|T285011]]
* 08:36 majavah: resolve merge conflicts on labs/private
=== 2021-07-16 ===
* 19:53 bstorm: set matchPolicy to equivalent on ingress admission controller for toolsbeta [[phab:T280360|T280360]]
* 14:04 arturo: deployed jobs-framework-api {{Gerrit|42b7a88}} ([[phab:T286132|T286132]])
=== 2021-07-15 ===
* 15:39 arturo: deploy toolforge-jobs-framework-api git version {{Gerrit|d85d93ee1c5d4be6a526cf83e806b2679dde3875}}
=== 2021-07-14 ===
* 09:05 majavah: testing calico 3.18 upgrade - [[phab:T280342|T280342]]
=== 2021-07-12 ===
* 11:42 majavah: rebooting toolsbeta-sgeexec-1002, nfs issues
=== 2021-07-07 ===
* 09:48 majavah: set dummy values for openstack ldap user/pass hiera values for disable_tool manifests to work
=== 2021-07-01 ===
* 17:01 majavah: updating jobs-framework-api
* 10:00 arturo: refreshed jobs-api deployment
=== 2021-06-29 ===
* 09:28 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-3.toolsbeta.eqiad1.wikimedia.cloud. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:28 wm-bot: Drained node toolsbeta-test-k8s-worker-3. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:27 wm-bot: Draining node toolsbeta-test-k8s-worker-3... ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:27 wm-bot: Depooling and removing worker , will pick the oldest. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:27 wm-bot: Added a new k8s worker toolsbeta-test-k8s-worker-6.toolsbeta.eqiad1.wikimedia.cloud to the worker pool - cookbook ran by dcaro@vulcanus
* 09:18 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 09:13 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 09:13 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-2.toolsbeta.eqiad1.wikimedia.cloud. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:13 wm-bot: Drained node toolsbeta-test-k8s-worker-2. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:12 wm-bot: Draining node toolsbeta-test-k8s-worker-2... ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:12 wm-bot: Depooling and removing worker , will pick the oldest. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 09:09 wm-bot: Added a new k8s worker toolsbeta-test-k8s-worker-5.toolsbeta.eqiad1.wikimedia.cloud to the worker pool - cookbook ran by dcaro@vulcanus
* 09:00 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 08:59 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-1.toolsbeta.eqiad1.wikimedia.cloud. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 08:59 wm-bot: Drained node toolsbeta-test-k8s-worker-1. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 08:58 wm-bot: Draining node toolsbeta-test-k8s-worker-1... ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 08:58 wm-bot: Depooling and removing worker , will pick the oldest. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 08:57 wm-bot: Draining node toolsbeta-test-k8s-worker-1... ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
* 08:57 wm-bot: Depooling and removing worker , will pick the oldest. ([[phab:T267140|T267140]]) - cookbook ran by dcaro@vulcanus
=== 2021-06-28 ===
* 14:46 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:45 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud. - cookbook ran by dcaro@vulcanus
* 14:45 wm-bot: Drained node toolsbeta-test-k8s-worker-4. - cookbook ran by dcaro@vulcanus
* 14:45 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
* 14:45 wm-bot: Depooling and removing worker toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud. - cookbook ran by dcaro@vulcanus
* 13:23 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
* 13:22 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
* 13:16 wm-bot: Draining node toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud... - cookbook ran by dcaro@vulcanus
* 11:30 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:25 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:23 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:21 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:12 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:06 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 11:06 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:54 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:53 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:44 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 10:11 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 09:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 09:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 08:51 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 08:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
=== 2021-06-25 ===
* 15:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:21 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:19 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:17 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:08 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:07 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:03 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:02 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:00 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:59 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:52 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:45 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:19 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:18 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 13:57 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 13:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 13:55 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 13:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 12:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 12:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 12:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
=== 2021-06-24 ===
* 15:52 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:33 dcaro: created flavor g3.cores4.ram8.disk20.ephem40 for the k8s workers
* 15:10 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 15:09 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:59 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:31 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:28 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:24 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
* 14:13 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
=== 2021-06-22 ===
* 18:24 majavah: rolling out kubernetes patch release 1.18.20, cluster is currently at 1.18.18
=== 2021-06-17 ===
* 11:44 majavah: toolsbeta-puppetdb-02: stop puppetdb to free up its ram usage, start postgres process, start puppetdb up again
=== 2021-06-16 ===
* 15:53 majavah: add default security group rule allowing prometheus01.metricsinfra to connect to node-exporter port 9100
=== 2021-06-15 ===
* 16:10 majavah: set toolsbeta-bastion-05 as grid submit host
=== 2021-06-14 ===
* 21:29 bstorm: deploy package with the staged patch to switch away from os.execv to QA in toolsbeta as toollabs-webservice version 0.75 [[phab:T282975|T282975]]
* 10:19 arturo: deploying toolforge jobs-framework-api in kubernetes (just a test) ([[phab:T283238|T283238]])
=== 2021-06-12 ===
* 14:42 majavah: sync hiera key prometheus_nodes to match tools
=== 2021-06-11 ===
* 15:25 majavah: undeploy nginx-ingress-jobs from kubernetes
* 14:54 majavah: generate and add own root key to passwords::root::extra_keys
=== 2021-06-08 ===
* 15:11 majavah: updating k8s worker nodes to 1.18 [[phab:T280299|T280299]]
* 15:02 majavah: continuing to update k8s ingress nodes [[phab:T280299|T280299]]
* 14:57 majavah: continuing to update rest of k8s control nodes [[phab:T280299|T280299]]
* 14:42 majavah: remove toolsbeta-test-k8s-etcd-[15,16] from kubernetes, instances do not exist, likely leftovers from local storage work
* 14:08 majavah: update toolsbeta-test-k8s-control-4 to kubernetes 1.18 [[phab:T280299|T280299]]
=== 2021-06-03 ===
* 16:55 majavah: renew ingress-admission-controller certificates [[phab:T280301|T280301]]
* 16:49 majavah: renew registry-admission-webhook certificates [[phab:T280301|T280301]]
=== 2021-05-25 ===
* 17:14 andrewbogott: deleting old ingress controllers toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2
* 17:13 andrewbogott: created two new ingress nodes, toolsbeta-test-k8s-ingress-4 and toolsbeta-test-k8s-ingress-5
* 15:09 dcaro: turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020
=== 2021-05-24 ===
* 19:40 andrewbogott: replacing existing etcd nodes with localdisk nodes
=== 2021-05-19 ===
* 11:35 Majavah: testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/692875/
* 06:51 Majavah: depool toolsbeta-test-k8s-ingress-1
=== 2021-05-15 ===
* 07:52 Majavah: set profile::wmcs::kubeadm::control::apiserver_cert_alternative_names hiera key and adjust config map [[phab:T262562|T262562]]
=== 2021-05-14 ===
* 11:22 arturo: allowed VIP address from the new port 172.16.3.26 into the ports of toolsbeta-redis-[1-3] ([[phab:T153810|T153810]])
* 11:16 arturo: aborrero@cloudcontrol1005:~ $ sudo wmcs-openstack --os-project-id=toolsbeta port create --network lan-flat-cloudinstances2b toolsbeta-redis-vip ([[phab:T153810|T153810]])
=== 2021-05-13 ===
* 08:07 Majavah: creating toolsbeta-redis-[1-3] as g3.cores1.ram2.disk20 to experiment with redis-sentinel / [[phab:T153810|T153810]]
=== 2021-05-10 ===
* 19:42 bstorm: setting profile::wmcs::kubeadm::docker_vol: false on ingress nodes
* 17:43 Majavah: testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/688361 in toolsbeta [[phab:T264221|T264221]]
* 11:50 Majavah: testing ingress-nginx update https://gerrit.wikimedia.org/r/c/operations/puppet/+/685715 on toolsbeta [[phab:T264221|T264221]]
=== 2021-05-08 ===
* 10:42 Majavah: create new ingress node toolsbeta-k8s-ingress-3 [[phab:T264221|T264221]]
=== 2021-05-07 ===
* 17:00 bstorm: deleted "toolsbeta-test-k8s-haproxy-2", "toolsbeta-test-k8s-haproxy-1" when the dns caches finally dropped [[phab:T282227|T282227]]
* 16:30 bstorm: recreated k8s.toolsbeta.eqiad1.wikimedia.cloud. as a CNAME to k8s.svc.toolsbeta.eqiad1.wikimedia.cloud. [[phab:T282227|T282227]]
* 16:16 Majavah: create record k8s.svc.toolsbeta.eqiad1.wikimedia.cloud. pointing to haproxy vip [[phab:T282227|T282227]]
* 14:20 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/686607/
* 09:44 arturo: `sudo wmcs-openstack --os-project-id=toolsbeta port create --network lan-flat-cloudinstances2b toolsbeta-k8s-haproxy-keepalived-vip`
* 08:19 Majavah: rebuild toolsbeta-test-k8s-haproxy-[12] without nfs
=== 2021-05-05 ===
* 16:25 Majavah: add self to sudo policy `roots`
* 16:07 arturo: grant `taavi` projectadmin (Majavah)
=== 2021-05-04 ===
* 10:47 arturo: rebase & resolve merge conflicts in labs/private.git
=== 2021-05-03 ===
* 13:23 arturo: livehacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/684032 ([[phab:T278109|T278109]])
=== 2021-04-29 ===
* 18:10 bstorm: added and removed an etcd node
=== 2021-04-23 ===
* 17:24 bstorm: rebooting toolsbeta-test-k8s-control-6 because it was "notready" for some reason
=== 2021-04-20 ===
* 19:01 bstorm: updated the maintain-kubeusers:beta image to https://gerrit.wikimedia.org/r/c/labs/tools/maintain-kubeusers/+/680244
=== 2021-04-13 ===
* 16:41 arturo: create VM toolsbeta-sgeexec-1002 ([[phab:T277653|T277653]])
* 15:44 arturo: delete VMs toolsbeta-sgeexec-0903 and toolsbeta-buster-sgeexec-01 (no longer useful)
* 15:36 arturo: created VM toolsbeta-sgeexec-0903 (buster) ([[phab:T277653|T277653]])
* 15:31 arturo: live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/678043/ ([[phab:T277653|T277653]])
=== 2021-04-08 ===
* 18:27 bstorm: cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for toolsbeta-sgegrid-master and toolsbeta-sgegrid-shadow using the old fqdns [[phab:T277653|T277653]]
=== 2021-04-06 ===
* 13:11 dcaro: Removing etcd member toolsbeta-test-k8s-etcd-7.tools.eqiad1.wikimedia.cloud to get an odd number  ([[phab:T267082|T267082]])
=== 2021-04-01 ===
* 15:17 dcaro: etcd cluster shrunk 3 members (using wmcs.toolforge.remove_etcd_node cookbook)
* 14:54 dcaro: shrinking etcd cluster to 3 members, cleaning up automation runs
=== 2021-03-31 ===
* 18:22 bstorm: redeploy ingress-admission controller with `kubectl apply -k deploys/toolsbeta` from the repo [[phab:T275478|T275478]]
=== 2021-03-24 ===
* 12:17 arturo: attach the `toolsbeta-docker-registry-data` volume to the `toolsbeta-docker-registry-02` VM
* 11:41 arturo: created VM toolsbeta-docker-registry-02 as Debian buster ([[phab:T278303|T278303]])
* 11:34 arturo: attached cinder volume `toolsbeta-docker-registry-data` as /dev/vdb on toolsbeta-docker-registry-01
* 11:23 arturo: created 2G cinder volume `toolsbeta-docker-registry-data` ([[phab:T278303|T278303]])
=== 2021-03-23 ===
* 11:22 arturo: drop and build again the VM toolsbeta-sgregrid-master ([[phab:T277653|T277653]])
* 11:07 arturo: drop and build again the VM toolsbeta-sgregrid-shadow ([[phab:T277653|T277653]])
=== 2021-03-18 ===
* 18:55 bstorm: set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix
* 18:50 arturo: deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project)
* 18:49 arturo: deleting VM toolsbeta-workflow-test, no longer useful
* 18:44 arturo: replacing toolsbeta-sgegrid-master with a Debian Buster VM ([[phab:T277653|T277653]])
* 16:24 arturo: live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
* 12:53 arturo: create anti-affinity server group toolsbeta-sgegrid-master-shadow
* 12:51 arturo: rebuild toolsbeta-sgegrid-shadow instance as debian buster ([[phab:T277653|T277653]])
* 12:50 arturo: added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here
* 12:48 arturo: destroy VM toolsbeta-buster-gridmaster (no longer useful) [[phab:T277653|T277653]]
* 12:47 arturo: delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) [[phab:T277653|T277653]]
=== 2021-03-17 ===
* 12:39 arturo: created VM toolsbeta-buster-gridmaster ([[phab:T277653|T277653]])
* 12:38 arturo: created puppet prefix 'toolsbeta-buster-gridmaster' ([[phab:T277653|T277653]])
* 12:00 arturo: create VM toolsbeta-buster-sgeexec-01 ([[phab:T277653|T277653]])
* 11:56 arturo: created puppet prefix 'toolsbeta-buster-sgeexec' ([[phab:T277653|T277653]])
* 10:34 arturo: re-create toolsbeta-bastion-05 ([[phab:T275865|T275865]])
=== 2021-03-16 ===
* 12:32 arturo: added packages jobutils / misctools v1.41 to <nowiki>{</nowiki>stretch,buster<nowiki>}</nowiki>-toolsbeta aptly repository in tools-sge-services-03
=== 2021-03-11 ===
* 12:33 arturo: livehacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/667144 for [[phab:T275865|T275865]]
=== 2021-03-10 ===
* 16:48 arturo: briefly stopping VM toolsbeta-test-k8s-etcd-8 to migrate hypervisor
=== 2021-02-26 ===
* 20:39 andrewbogott: rebooting all hosts
* 15:35 dcaro: removed toolsbeta-test-k8s-etcd-9 with depool from kubeadmin/etcd ([[phab:T274497|T274497]])
* 11:46 arturo: `openstack server create --os-project-id toolsbeta --image debian-10.0-buster --flavor g2.cores2.ram4.disk40 --network lan-flat-cloudinstances2b --property description='buster bastion test' toolsbeta-bastion-05` ([[phab:T275865|T275865]])
* 11:39 arturo: created puppet prefix 'toolsbeta-bastion' to hold new configuration for buster-based bastions ([[phab:T275865|T275865]])
* 09:09 dcaro: Playing around with cookbooks by adding/removing etcd nodes, etcd might missbehave from time to time ([[phab:T274497|T274497]])
=== 2021-02-19 ===
* 12:42 arturo: deploying new version of the ingress admission controller
* 11:46 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/662941 ([[phab:T274139|T274139]]) which should only affect toolsbeta
* 10:27 arturo: create DNS record `jobs.svc.toolsbeta.eqiad1.wikimedia.cloud` with CNAME to `k8s.toolsbeta.eqiad1.wikimedia.cloud` ([[phab:T274139|T274139]])
* 10:25 arturo: create DNS zone `svc.toolsbeta.eqiad1.wikimedia.cloud` ([[phab:T274139|T274139]])
=== 2021-02-10 ===
* 12:34 arturo: live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/662941 ([[phab:T274139|T274139]])
* 12:23 arturo: add `webserver` security group to toolsbeta-proxy-3 and -4
* 12:20 arturo: fix A record for `toolsbeta.wmflabs.org`, point it to 172.16.1.150  (toolsbeta-proxy-3), it was previously pointing to an old IP address
=== 2021-02-08 ===
* 11:48 arturo: trying to introduce TLS support in the front proxy [[phab:T274123|T274123]]
=== 2021-02-05 ===
* 00:36 bstorm: updated jobutils and miscutils to 1.40 in aptly for toolsbeta testing
=== 2021-01-21 ===
* 15:29 bstorm: pushed the maintain-kubeusers:beta tag with the new code to the docker repo [[phab:T271847|T271847]]
=== 2021-01-13 ===
* 14:10 dcaro: dcaro doing puppet tests, puppet runs might break
* 10:07 arturo: allocate floating IP 185.15.56.84, and use it for docker-registry.toolsbeta.wmflabs.org (instance toolsbeta-docker-registry-01) ([[phab:T271867|T271867]])
* 10:05 arturo: release and delete floating IP 185.15.56.242 (docker-registry.toolsbeta.wmflabs.org) ([[phab:T271867|T271867]])
=== 2020-12-22 ===
* 10:48 arturo: rebase & resolve ugly git merge conflict in labs/private.git
=== 2020-12-18 ===
* 10:52 arturo: live-hacking local puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/650470 ([[phab:T267966|T267966]])
=== 2020-12-14 ===
* 19:27 bstorm: create temporary instance toolsbeta-test-io-unthrottled [[phab:T267966|T267966]]
* 19:25 bstorm: created temporary instance toolsbeta-io-test-local  [[phab:T267966|T267966]]
=== 2020-12-11 ===
* 23:31 bstorm: increasing the output throttle for toolsbeta-test-k8s-haproxy-* nodes in order to figure out what's up with the timeouts
=== 2020-12-10 ===
* 08:58 dcaro: starting a new etcd instance completely from ansible playbook (etcd-8) ([[phab:T267412|T267412]])
=== 2020-12-09 ===
* 15:30 dcaro: Playing aronud adding a new etcd node (k8s-etcd-7) ([[phab:T267412|T267412]])
=== 2020-12-04 ===
* 11:17 dcaro: Created a new 'standardized' security froup for k8s from ansible toolsbeta-k8s-full-connectivity ([[phab:T267412|T267412]])
* 10:12 dcaro: Trying to create a whole new etcd member from ansible ([[phab:T267412|T267412]])
=== 2020-11-23 ===
* 14:17 dcaro: All control nodes re-imaged ([[phab:T267140|T267140]])
* 14:08 dcaro: Taking control-3 node out as control-6 is up and running ([[phab:T267140|T267140]])
* 11:12 dcaro: Launching control-6, to replace control-3 ([[phab:T267140|T267140]])
* 10:45 dcaro: Taking out control-2 node, replaced by control-5 (I saw one 503 reply on the proxy when creating control-5, fyi) ([[phab:T267140|T267140]])
* 10:32 dcaro: Creating new control-5 node (will replace control-2) ([[phab:T267140|T267140]])
* 09:58 dcaro: Remove control-1 node from the pool (was replaced by control-4) ([[phab:T267140|T267140]])
* 09:57 dcaro: Remove control-1 node from the pool (was replaced by control-4) ([[phab:T267195|T267195]])
=== 2020-11-18 ===
* 11:46 dcaro_: Modifying the security groupts to mirror tools ([[phab:T267140|T267140]])
* 10:50 dcaro_: Adding new control-4 node to the control cluster ([[phab:T267140|T267140]])
=== 2020-11-17 ===
=== 2020-11-17 ===
* 15:32 dcaro: Creating new toolsbeta-test-k8s-control-4 node and adding it to the cluster ([[phab:T267140|T267140]])
* 15:32 dcaro: Creating new toolsbeta-test-k8s-control-4 node and adding it to the cluster ([[phab:T267140|T267140]])

Revision as of 12:15, 20 October 2021

2021-10-20

  • 12:15 majavah: upload toolforge-webservice 0.78 to stretch,buster,bullsye-toolsbeta repositories

2021-10-16

  • 07:47 majavah: deployed cert-manager and wave as a test for automating T292238

2021-10-14

  • 15:02 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus
  • 15:01 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus
  • 15:00 wm-bot: Joining grid node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the toolsbeta cluster - cookbook ran by dcaro@vulcanus

2021-10-13

  • 11:18 wm-bot: Added a new grid webgrid generic node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the pool (T292465) - cookbook ran by dcaro@vulcanus
  • 10:19 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:19 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

2021-10-12

  • 16:10 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 14:52 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 14:46 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 07:05 majavah: start gridengine-master.service on toolsbeta-sgegrid-master

2021-10-11

  • 15:24 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 15:00 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:32 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

2021-10-07

  • 14:21 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 14:06 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 13:31 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 12:55 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 12:50 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 12:50 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 08:04 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 07:58 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

2021-10-06

  • 10:36 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:13 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:08 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:07 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
  • 10:05 wm-bot: Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

2021-10-04

  • 17:07 bstorm: reboot everything T291406
  • 17:06 bstorm: use cumin to edit fstab to remove old nfs mounts T291406
  • 16:41 bstorm: setting mount_nfs: true on toolsbeta-mail prefix (which is the correct setting)
  • 14:45 dcaro: rebooting toolsbeta-sgewebgrid-generic-0901.toolsbeta.eqiad1.wikimedia.cloud to force a fsck of the dm-0 device on boot (T290970)

2021-10-01

  • 12:34 arturo: rebooting toolsbeta-sgebastion-04 (T292289)
  • 12:12 arturo: experimenting with newer mono runtime on toolsbeta-sgebastion-04 (T292289)

2021-09-29

  • 22:13 bstorm: ran label fix script to use new label format
  • 22:12 bstorm: toollabs-webservice 0.77 deployed

2021-09-28

  • 10:32 majavah: removing all podpreset objects and disabling settings.k8s.io/v1alpha1 api

2021-09-27

  • 16:13 majavah: testing volume-admission fix for containers with some volumes mounted

2021-09-23

  • 17:14 majavah: testing new maintain-kubeusers release T279106

2021-09-22

  • 18:07 bstorm: launching toolsbeta-nfs-test-client-01 to run a "fair" test battery against T291406

2021-09-15

  • 08:04 majavah: tools-manifest 0.24, T290325

2021-09-14

  • 15:45 majavah: disable podpreset admission plugin in toolsbeta T279106
  • 11:42 arturo: deploying jobs-framework-emailer 3045601 (T286135)
  • 10:44 arturo: deploying jobs-framework-emailer 51032af (T286135)
  • 10:39 arturo: deploying jobs-framework-api 16fbf51 (T286135)

2021-09-13

  • 15:44 majavah: deploy volume-admission-controller in background; T279106

2021-09-09

  • 17:36 bstorm: deploying a base tekton triggers setup T267374
  • 16:50 majavah: enable unattended updates on toolsbeta T290494
  • 16:19 arturo: 70017ec0ac root@toolsbeta-test-k8s-control-4:~# kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml
  • 00:26 bstorm: deleted toolsbeta-sgeexec-0902 since it had a badly screwed up /tmp

2021-09-03

  • 22:34 bstorm: backfilled quotas for T286784

2021-08-30

  • 23:23 bstorm: deleting toolsbeta-workflow-test T289709

2021-08-21

  • 00:17 bstorm: rebooting the control plane nodes for kubernetes because it can't make things worse T289390

2021-08-20

  • 23:19 bstorm: tried renewing all the certs to get certs working again in kubernetes

2021-08-12

  • 16:55 bstorm: deployed updated manifest for ingress-admission
  • 15:02 majavah: deploying ingress-admission-controller using v1 api T280436

2021-07-30

  • 08:01 majavah: replace toolsbeta-sgeexec-1002 with -1004 for T287666

2021-07-29

  • 14:08 majavah: add mdipietro as projectadmin T287287
  • 13:06 majavah: rebuild toolsbeta-sgeexec-1001 as -1003 T287666

2021-07-23

  • 13:31 majavah: upgrading toolsbeta to kubernetes 1.19, T280340

2021-07-22

  • 15:32 arturo: re-deploying toolforge-jobs-framework-api

2021-07-21

2021-07-20

  • 21:18 bstorm: applied `login_server: true` to toolsbeta-sgecron-01 T287037
  • 19:09 bstorm: upgraded version of maintain-kubeusers to the latest in master branch T285011
  • 08:36 majavah: resolve merge conflicts on labs/private

2021-07-16

  • 19:53 bstorm: set matchPolicy to equivalent on ingress admission controller for toolsbeta T280360
  • 14:04 arturo: deployed jobs-framework-api 42b7a88 (T286132)

2021-07-15

  • 15:39 arturo: deploy toolforge-jobs-framework-api git version d85d93e

2021-07-14

  • 09:05 majavah: testing calico 3.18 upgrade - T280342

2021-07-12

  • 11:42 majavah: rebooting toolsbeta-sgeexec-1002, nfs issues

2021-07-07

  • 09:48 majavah: set dummy values for openstack ldap user/pass hiera values for disable_tool manifests to work

2021-07-01

  • 17:01 majavah: updating jobs-framework-api
  • 10:00 arturo: refreshed jobs-api deployment

2021-06-29

  • 09:28 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-3.toolsbeta.eqiad1.wikimedia.cloud. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:28 wm-bot: Drained node toolsbeta-test-k8s-worker-3. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:27 wm-bot: Draining node toolsbeta-test-k8s-worker-3... (T267140) - cookbook ran by dcaro@vulcanus
  • 09:27 wm-bot: Depooling and removing worker , will pick the oldest. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:27 wm-bot: Added a new k8s worker toolsbeta-test-k8s-worker-6.toolsbeta.eqiad1.wikimedia.cloud to the worker pool - cookbook ran by dcaro@vulcanus
  • 09:18 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 09:13 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 09:13 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-2.toolsbeta.eqiad1.wikimedia.cloud. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:13 wm-bot: Drained node toolsbeta-test-k8s-worker-2. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:12 wm-bot: Draining node toolsbeta-test-k8s-worker-2... (T267140) - cookbook ran by dcaro@vulcanus
  • 09:12 wm-bot: Depooling and removing worker , will pick the oldest. (T267140) - cookbook ran by dcaro@vulcanus
  • 09:09 wm-bot: Added a new k8s worker toolsbeta-test-k8s-worker-5.toolsbeta.eqiad1.wikimedia.cloud to the worker pool - cookbook ran by dcaro@vulcanus
  • 09:00 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 08:59 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-1.toolsbeta.eqiad1.wikimedia.cloud. (T267140) - cookbook ran by dcaro@vulcanus
  • 08:59 wm-bot: Drained node toolsbeta-test-k8s-worker-1. (T267140) - cookbook ran by dcaro@vulcanus
  • 08:58 wm-bot: Draining node toolsbeta-test-k8s-worker-1... (T267140) - cookbook ran by dcaro@vulcanus
  • 08:58 wm-bot: Depooling and removing worker , will pick the oldest. (T267140) - cookbook ran by dcaro@vulcanus
  • 08:57 wm-bot: Draining node toolsbeta-test-k8s-worker-1... (T267140) - cookbook ran by dcaro@vulcanus
  • 08:57 wm-bot: Depooling and removing worker , will pick the oldest. (T267140) - cookbook ran by dcaro@vulcanus

2021-06-28

  • 14:46 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:45 wm-bot: Depooled and removed worker toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud. - cookbook ran by dcaro@vulcanus
  • 14:45 wm-bot: Drained node toolsbeta-test-k8s-worker-4. - cookbook ran by dcaro@vulcanus
  • 14:45 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
  • 14:45 wm-bot: Depooling and removing worker toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud. - cookbook ran by dcaro@vulcanus
  • 13:23 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
  • 13:22 wm-bot: Draining node toolsbeta-test-k8s-worker-4... - cookbook ran by dcaro@vulcanus
  • 13:16 wm-bot: Draining node toolsbeta-test-k8s-worker-4.toolsbeta.eqiad1.wikimedia.cloud... - cookbook ran by dcaro@vulcanus
  • 11:30 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:25 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:23 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:21 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:12 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:06 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 11:06 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:54 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:53 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:44 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 10:11 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 09:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 09:16 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 08:51 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 08:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus

2021-06-25

  • 15:27 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:21 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:19 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:17 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:15 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:08 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:07 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:03 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:02 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:00 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:59 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:52 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:45 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:19 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:18 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 13:57 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 13:56 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 13:55 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 13:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 12:50 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 12:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 12:26 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus

2021-06-24

  • 15:52 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:33 dcaro: created flavor g3.cores4.ram8.disk20.ephem40 for the k8s workers
  • 15:10 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 15:09 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:59 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:35 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:31 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:28 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:24 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus
  • 14:13 wm-bot: Adding a new k8s worker node - cookbook ran by dcaro@vulcanus

2021-06-22

  • 18:24 majavah: rolling out kubernetes patch release 1.18.20, cluster is currently at 1.18.18

2021-06-17

  • 11:44 majavah: toolsbeta-puppetdb-02: stop puppetdb to free up its ram usage, start postgres process, start puppetdb up again

2021-06-16

  • 15:53 majavah: add default security group rule allowing prometheus01.metricsinfra to connect to node-exporter port 9100

2021-06-15

  • 16:10 majavah: set toolsbeta-bastion-05 as grid submit host

2021-06-14

  • 21:29 bstorm: deploy package with the staged patch to switch away from os.execv to QA in toolsbeta as toollabs-webservice version 0.75 T282975
  • 10:19 arturo: deploying toolforge jobs-framework-api in kubernetes (just a test) (T283238)

2021-06-12

  • 14:42 majavah: sync hiera key prometheus_nodes to match tools

2021-06-11

  • 15:25 majavah: undeploy nginx-ingress-jobs from kubernetes
  • 14:54 majavah: generate and add own root key to passwords::root::extra_keys

2021-06-08

  • 15:11 majavah: updating k8s worker nodes to 1.18 T280299
  • 15:02 majavah: continuing to update k8s ingress nodes T280299
  • 14:57 majavah: continuing to update rest of k8s control nodes T280299
  • 14:42 majavah: remove toolsbeta-test-k8s-etcd-[15,16] from kubernetes, instances do not exist, likely leftovers from local storage work
  • 14:08 majavah: update toolsbeta-test-k8s-control-4 to kubernetes 1.18 T280299

2021-06-03

  • 16:55 majavah: renew ingress-admission-controller certificates T280301
  • 16:49 majavah: renew registry-admission-webhook certificates T280301

2021-05-25

  • 17:14 andrewbogott: deleting old ingress controllers toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2
  • 17:13 andrewbogott: created two new ingress nodes, toolsbeta-test-k8s-ingress-4 and toolsbeta-test-k8s-ingress-5
  • 15:09 dcaro: turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020

2021-05-24

  • 19:40 andrewbogott: replacing existing etcd nodes with localdisk nodes

2021-05-19

2021-05-15

  • 07:52 Majavah: set profile::wmcs::kubeadm::control::apiserver_cert_alternative_names hiera key and adjust config map T262562

2021-05-14

  • 11:22 arturo: allowed VIP address from the new port 172.16.3.26 into the ports of toolsbeta-redis-[1-3] (T153810)
  • 11:16 arturo: aborrero@cloudcontrol1005:~ $ sudo wmcs-openstack --os-project-id=toolsbeta port create --network lan-flat-cloudinstances2b toolsbeta-redis-vip (T153810)

2021-05-13

  • 08:07 Majavah: creating toolsbeta-redis-[1-3] as g3.cores1.ram2.disk20 to experiment with redis-sentinel / T153810

2021-05-10

2021-05-08

  • 10:42 Majavah: create new ingress node toolsbeta-k8s-ingress-3 T264221

2021-05-07

  • 17:00 bstorm: deleted "toolsbeta-test-k8s-haproxy-2", "toolsbeta-test-k8s-haproxy-1" when the dns caches finally dropped T282227
  • 16:30 bstorm: recreated k8s.toolsbeta.eqiad1.wikimedia.cloud. as a CNAME to k8s.svc.toolsbeta.eqiad1.wikimedia.cloud. T282227
  • 16:16 Majavah: create record k8s.svc.toolsbeta.eqiad1.wikimedia.cloud. pointing to haproxy vip T282227
  • 14:20 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/686607/
  • 09:44 arturo: `sudo wmcs-openstack --os-project-id=toolsbeta port create --network lan-flat-cloudinstances2b toolsbeta-k8s-haproxy-keepalived-vip`
  • 08:19 Majavah: rebuild toolsbeta-test-k8s-haproxy-[12] without nfs

2021-05-05

  • 16:25 Majavah: add self to sudo policy `roots`
  • 16:07 arturo: grant `taavi` projectadmin (Majavah)

2021-05-04

  • 10:47 arturo: rebase & resolve merge conflicts in labs/private.git

2021-05-03

2021-04-29

  • 18:10 bstorm: added and removed an etcd node

2021-04-23

  • 17:24 bstorm: rebooting toolsbeta-test-k8s-control-6 because it was "notready" for some reason

2021-04-20

2021-04-13

2021-04-08

  • 18:27 bstorm: cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for toolsbeta-sgegrid-master and toolsbeta-sgegrid-shadow using the old fqdns T277653

2021-04-06

  • 13:11 dcaro: Removing etcd member toolsbeta-test-k8s-etcd-7.tools.eqiad1.wikimedia.cloud to get an odd number (T267082)

2021-04-01

  • 15:17 dcaro: etcd cluster shrunk 3 members (using wmcs.toolforge.remove_etcd_node cookbook)
  • 14:54 dcaro: shrinking etcd cluster to 3 members, cleaning up automation runs

2021-03-31

  • 18:22 bstorm: redeploy ingress-admission controller with `kubectl apply -k deploys/toolsbeta` from the repo T275478

2021-03-24

  • 12:17 arturo: attach the `toolsbeta-docker-registry-data` volume to the `toolsbeta-docker-registry-02` VM
  • 11:41 arturo: created VM toolsbeta-docker-registry-02 as Debian buster (T278303)
  • 11:34 arturo: attached cinder volume `toolsbeta-docker-registry-data` as /dev/vdb on toolsbeta-docker-registry-01
  • 11:23 arturo: created 2G cinder volume `toolsbeta-docker-registry-data` (T278303)

2021-03-23

  • 11:22 arturo: drop and build again the VM toolsbeta-sgregrid-master (T277653)
  • 11:07 arturo: drop and build again the VM toolsbeta-sgregrid-shadow (T277653)

2021-03-18

  • 18:55 bstorm: set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix
  • 18:50 arturo: deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project)
  • 18:49 arturo: deleting VM toolsbeta-workflow-test, no longer useful
  • 18:44 arturo: replacing toolsbeta-sgegrid-master with a Debian Buster VM (T277653)
  • 16:24 arturo: live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
  • 12:53 arturo: create anti-affinity server group toolsbeta-sgegrid-master-shadow
  • 12:51 arturo: rebuild toolsbeta-sgegrid-shadow instance as debian buster (T277653)
  • 12:50 arturo: added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here
  • 12:48 arturo: destroy VM toolsbeta-buster-gridmaster (no longer useful) T277653
  • 12:47 arturo: delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) T277653

2021-03-17

  • 12:39 arturo: created VM toolsbeta-buster-gridmaster (T277653)
  • 12:38 arturo: created puppet prefix 'toolsbeta-buster-gridmaster' (T277653)
  • 12:00 arturo: create VM toolsbeta-buster-sgeexec-01 (T277653)
  • 11:56 arturo: created puppet prefix 'toolsbeta-buster-sgeexec' (T277653)
  • 10:34 arturo: re-create toolsbeta-bastion-05 (T275865)

2021-03-16

  • 12:32 arturo: added packages jobutils / misctools v1.41 to {stretch,buster}-toolsbeta aptly repository in tools-sge-services-03

2021-03-11

2021-03-10

  • 16:48 arturo: briefly stopping VM toolsbeta-test-k8s-etcd-8 to migrate hypervisor

2021-02-26

  • 20:39 andrewbogott: rebooting all hosts
  • 15:35 dcaro: removed toolsbeta-test-k8s-etcd-9 with depool from kubeadmin/etcd (T274497)
  • 11:46 arturo: `openstack server create --os-project-id toolsbeta --image debian-10.0-buster --flavor g2.cores2.ram4.disk40 --network lan-flat-cloudinstances2b --property description='buster bastion test' toolsbeta-bastion-05` (T275865)
  • 11:39 arturo: created puppet prefix 'toolsbeta-bastion' to hold new configuration for buster-based bastions (T275865)
  • 09:09 dcaro: Playing around with cookbooks by adding/removing etcd nodes, etcd might missbehave from time to time (T274497)

2021-02-19

  • 12:42 arturo: deploying new version of the ingress admission controller
  • 11:46 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/662941 (T274139) which should only affect toolsbeta
  • 10:27 arturo: create DNS record `jobs.svc.toolsbeta.eqiad1.wikimedia.cloud` with CNAME to `k8s.toolsbeta.eqiad1.wikimedia.cloud` (T274139)
  • 10:25 arturo: create DNS zone `svc.toolsbeta.eqiad1.wikimedia.cloud` (T274139)

2021-02-10

  • 12:34 arturo: live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/662941 (T274139)
  • 12:23 arturo: add `webserver` security group to toolsbeta-proxy-3 and -4
  • 12:20 arturo: fix A record for `toolsbeta.wmflabs.org`, point it to 172.16.1.150 (toolsbeta-proxy-3), it was previously pointing to an old IP address

2021-02-08

  • 11:48 arturo: trying to introduce TLS support in the front proxy T274123

2021-02-05

  • 00:36 bstorm: updated jobutils and miscutils to 1.40 in aptly for toolsbeta testing

2021-01-21

  • 15:29 bstorm: pushed the maintain-kubeusers:beta tag with the new code to the docker repo T271847

2021-01-13

  • 14:10 dcaro: dcaro doing puppet tests, puppet runs might break
  • 10:07 arturo: allocate floating IP 185.15.56.84, and use it for docker-registry.toolsbeta.wmflabs.org (instance toolsbeta-docker-registry-01) (T271867)
  • 10:05 arturo: release and delete floating IP 185.15.56.242 (docker-registry.toolsbeta.wmflabs.org) (T271867)

2020-12-22

  • 10:48 arturo: rebase & resolve ugly git merge conflict in labs/private.git

2020-12-18

2020-12-14

  • 19:27 bstorm: create temporary instance toolsbeta-test-io-unthrottled T267966
  • 19:25 bstorm: created temporary instance toolsbeta-io-test-local T267966

2020-12-11

  • 23:31 bstorm: increasing the output throttle for toolsbeta-test-k8s-haproxy-* nodes in order to figure out what's up with the timeouts

2020-12-10

  • 08:58 dcaro: starting a new etcd instance completely from ansible playbook (etcd-8) (T267412)

2020-12-09

  • 15:30 dcaro: Playing aronud adding a new etcd node (k8s-etcd-7) (T267412)

2020-12-04

  • 11:17 dcaro: Created a new 'standardized' security froup for k8s from ansible toolsbeta-k8s-full-connectivity (T267412)
  • 10:12 dcaro: Trying to create a whole new etcd member from ansible (T267412)

2020-11-23

  • 14:17 dcaro: All control nodes re-imaged (T267140)
  • 14:08 dcaro: Taking control-3 node out as control-6 is up and running (T267140)
  • 11:12 dcaro: Launching control-6, to replace control-3 (T267140)
  • 10:45 dcaro: Taking out control-2 node, replaced by control-5 (I saw one 503 reply on the proxy when creating control-5, fyi) (T267140)
  • 10:32 dcaro: Creating new control-5 node (will replace control-2) (T267140)
  • 09:58 dcaro: Remove control-1 node from the pool (was replaced by control-4) (T267140)
  • 09:57 dcaro: Remove control-1 node from the pool (was replaced by control-4) (T267195)

2020-11-18

  • 11:46 dcaro_: Modifying the security groupts to mirror tools (T267140)
  • 10:50 dcaro_: Adding new control-4 node to the control cluster (T267140)

2020-11-17

  • 15:32 dcaro: Creating new toolsbeta-test-k8s-control-4 node and adding it to the cluster (T267140)
  • 12:09 Lucas_WMDE: <dcaro> 11:59:36 UTC – toolbeta up and running again, documented on the live doc for now, apsrever had the wrong config (T267140)
  • 10:40 arturo: hand-edited /etc/kubernetes/manifests/kube-apiserver.yaml in all 3 k8s control nodes to account for new etcd servers (T267140)
  • 08:58 dcaro: etcd hosts reimaged (T267140)
  • 08:54 dcaro: etcd-4,5 and 6 are up and running, removing 1,2 and 3 (T267140)

2020-11-16

  • 11:44 dcaro: etcd5 member added, creating instance toolsbeta-test-k8s-etcd6 and adding to the etcd cluster (T267140)
  • 11:27 dcaro: Creating instance toolsbeta-test-k8s-etcd5 and adding to the etcd cluster (T267140)

2020-11-10

  • 19:42 bstorm: safelisted "argocd" namespace with namespaceSelector for registry-admission controller
  • 18:49 legoktm: associated floating IP to toolsbeta-docker-registry-01 and pointed DNS docker-registry.toolsbeta.wmflabs.org. at it
  • 18:27 legoktm: creating toolsbeta-docker-imagebuilder-01 (T267616)
  • 17:18 dcaro: launching instance toolsbeta-test-k8s-etcd-4 (T267140)
  • 17:15 dcaro: removing unused toolsbeta-k8s-etcd prefix (we use toolsbeta-test-k8s-etcd) (T267140)
  • 14:44 dcaro: taking down one of the test-k8s etcd nodes to reimage (T267140)

2020-11-06

  • 23:44 bstorm: toolsbeta k8s cluster fully upgraded to 1.17.13 T263284
  • 21:23 bstorm: upgrading toolsbeta-test-k8s-control-1 to k8s 1.17.13 T263284
  • 15:56 dcaro: Deleting instances proxy-1 and proxy-2, that will finish the proxy rebuild (T267140)
  • 15:53 dcaro: Removing proxy-1 and proxy-3 from hiera, proxy-3 stays as active and proxy-4 as backup (T267140)
  • 13:18 dcaro: bringin up a new proxy-4 instance as slave (T267140)
  • 13:18 dcaro: bringin up a new proxy-4 instance as slave

2020-11-05

  • 16:40 dcaro: Moving active proxy from proxy-1 to proxy-3 (T267140)
  • 15:54 dcaro: Adding toolsbeta-proxy-3 to the list of slave proxies in hiera (T267140)

2020-11-04

  • 15:42 dcaro: re-creating the toolsbeta-proxy-03, used wrong image on the first try (T267140)
  • 15:21 dcaro: creating new proxy instance toolsbeta-proxy-03
  • 15:18 arturo: dropping project hiera config for `toollabs::checker_hosts`, `toollabs::proxy::ssl_certificate_name`, `toollabs::proxy::ssl_install_certificate` and `toollabs::proxy::web_domain`, no longer in use
  • 15:16 arturo: dropping project hiera config for `toollabs::proxy::proxies`, no longer in use
  • 11:46 dcaro: The k8s scheduler-01 fails to connect to etcd (not sure ever did), trying to fix

2020-11-03

  • 16:04 arturo: add dcaro to the toolsbeta.admin LDAP group (T266068)
  • 15:30 dcaro: T267121: Puppetmaster replaced, also removed old puppetdb master from hiera, testing
  • 15:07 dcaro: Replacing old puppetmaster 02 and 03 from hiera with 04
  • 10:55 dcaro: dcaro investigating puppet errors on toolsbeta-puppetdb-02

2020-11-02

  • 13:35 arturo: added dcaro as projectadmin & user (T266068)

2020-10-29

  • 22:20 legoktm: switched test tool over to use buildpack image (T265681)

2020-10-28

  • 18:58 andrewbogott: deleting toolsbeta-puppetmaster-03 — seems broken and unused

2020-10-22

  • 16:22 bstorm: created buildpack psp for T265557

2020-09-10

2020-09-09

  • 11:50 arturo: after force-rebooting everything, the k8s cluster seems to have recovered itself. magic.
  • 11:45 arturo: force-rebooting the 3 k8s etcd nodes. They seem down
  • 11:42 arturo: actually, the whole k8s cluster seems down? the API seems down at least
  • 11:39 arturo: all 3 k8s control nodes seem in bad shape. Wont let me ssh in, or use the console access. Try force-rebooting them
  • 11:27 arturo: created 2 VMs: toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2 (T250172)
  • 11:25 arturo: created new server group toolsbeta-k8s-ingress (T250172)
  • 11:24 arturo: created new puppet prefix `toolsbeta-test-k8s-ingress` (T250172)

2020-07-15

  • 21:35 bstorm: set all of toolsbeta to mount NFS 4.2 except the bastion T257945

2020-07-14

  • 22:28 bstorm: rebooting toolsbeta-sgebastion-04 during NFS testing thing

2020-07-08

2020-06-26

2020-06-24

2020-06-23

  • 13:10 arturo: added herron to the test tool for email testing
  • 11:36 arturo: removing `benapetr` and adding myself to the test tool
  • 11:02 arturo: setting `profile::toolforge::mail_domain: toolsbeta.wmflabs.org` in toolsbeta-mail puppet prefix
  • 10:55 arturo: allow ingress smtp/smtps traffic in the MTA security group
  • 10:52 arturo: created MX record pointing to mail.toolsbeta.wmflabs.org
  • 09:43 arturo: restarted nginx in toolsbeta-acme-chief-01 to pickup new certificate, otherwise clients won't accept its TLS cert
  • 09:38 arturo: live-hacking toolsbeta-puppetmaster-04 with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/607251

2020-06-16

  • 22:54 bd808: Building webservice 0.72

2020-06-15

  • 21:54 bstorm_: removed killgridjobs.sh from toolsbeta bastion T157792
  • 17:52 bd808: Building webservice 0.71

2020-06-12

  • 19:41 bstorm_: set `profile::wmcs::nfsclient::mode: soft` on toolsbeta-workflow-test T127559

2020-06-11

  • 12:42 arturo: introduce puppet profile 'toolsbeta-docker-registry' and relocate some hiera config there
  • 12:39 arturo: for the record, k8s etcd servers certificate changed (puppet based) and k8s just kept working
  • 12:35 arturo: according to `aborrero@cloud-cumin-01:~$ sudo cumin --force -x 'O{project:toolsbeta}' 'run-puppet-agent'` we are mostly back in business
  • 12:14 arturo: try switching all VMs to toolsbeta-puppetmaster-04
  • 12:14 arturo: poweroff toolsbeta-puppetmaster-03
  • 12:12 arturo: copy over labs/private from toolsbeta-puppetmaster-03 to toolsbeta-puppetmaster-04
  • 11:53 arturo: create VM toolsbeta-puppetmaster-04
  • 11:35 arturo: try reinstalling the python3 stack in toolsbeta-puppetmaster-03, because everything python-related segfaults
  • 11:33 arturo: reboot toolsbeta-puppetmaster-03 to try cleaning up potential kernel/filesystem problems
  • 11:32 arturo: apparently every python script segfaults in toolsbeta-puppetmaster-03
  • 11:27 arturo: puppetdb wasn't the problem. The problem is puppet-enc segfaulting in toolsbeta-puppetmaster-03
  • 11:21 arturo: puppet not working bc puppetdb, run `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb`

2020-06-04

  • 21:06 andrewbogott: added krenair to toolsbeta.admin group in ldap

2020-05-28

2020-05-27

  • 12:02 arturo: the k8s cluster is now running v1.16.10 (T246122)
  • 11:05 arturo: trying `modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --domain eqiad.wmflabs --src-version 1.15 --dst-version 1.16.10 -n toolsbeta-test-k8s-worker-1 -n toolsbeta-test-k8s-worker-2 -n toolsbeta-test-k8s-worker-3` (T246122)
  • 11:02 arturo: upgraded the rest of the k8s control plane nodes to 1.16.10 (T246122)
  • 10:58 arturo: running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo apt-get install kubelet -y` in the 1.16 version from the component repo (T246122)
  • 10:58 arturo: running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10` and this time it works! (T246122)

2020-05-26

  • 16:17 bstorm_: fix incorrect volume name in kubeadm-config T246122
  • 15:02 arturo: first k8s upgrade failed for yet-to-be-known reasons (T246122)
  • 14:54 arturo: `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10` (T246122)
  • 14:54 arturo: bump installed version of kubeadm and kubectl to 1.16.10 (T246122)
  • 09:57 arturo: installing kubectl/kubeadm 1.16.9 on k8s worker nodes (T246122)
  • 09:56 arturo: installing kubectl/kubeadm 1.16.9 on k8s control nodes (T246122)
  • 09:30 arturo: set `profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-1-16'` at project level for trying T246122
  • 09:25 arturo: `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb` broken puppet in this project because puppetdb is down again

2020-05-21

  • 22:14 bd808: Building tools-webservice 0.70 via wmcs-package-build.py

2020-05-19

  • 12:20 arturo: trying to install tesseract 4.1.0 in toolsbeta-sgebastion-04 (T247422)
  • 10:18 arturo: `aborrero@toolsbeta-puppetdb-02:~$ sudo systemctl restart puppetdb`

2020-05-15

  • 20:48 bstorm_: found an error in the new version of maintain-kubeusers, removing the deployment for now T246059
  • 20:35 bstorm_: updating the maintain-kubeusers image to be able to control admin accounts

2020-05-14

  • 12:09 arturo: created puppet prefix toolsbeta-acme-chief in horizon (T252762)
  • 12:08 arturo: created toolsbeta-acme-chief-01 VM (T252762)

2020-05-12

  • 18:35 bstorm_: upgraded to using typha and rolled back to not doing so -- no affect on existing network T250863
  • 17:44 bstorm_: set the calico version to v3.14.0 because the new liveness probe isn't compatible with the old version. T250863
  • 17:36 bstorm_: deployed an updated bit of yaml for calico without upgrading the version first T250863

2020-05-08

  • 12:48 arturo: allocated floating IP `185.15.56.12` for the VM `toolsbeta-email-01` and FQDN `mail.toolsbeta.wmflabs.org` (T120225)
  • 12:24 arturo: added puppet prefix `toolsbeta-email` (T120225)

2020-05-07

2020-05-06

2020-05-05

  • 10:04 arturo: add herron as user and projectadmin, we will work on the email setup (T120225)
  • 09:59 arturo: created VM toolsbeta-mail-01 (T120225)

2020-05-04

  • 13:02 arturo: `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb.service` trying to bring back puppetdb, which is preventing puppet agent runs in the whole project

2020-04-29

  • 19:48 bstorm_: ran the scary rewrite-psp-preset.sh script across toolsbeta T247455

2020-04-20

  • 14:47 arturo: added joakino to toolsbeta.admin LDAP group
  • 12:06 arturo: installing tools-webservice v0.68 for testing
  • 11:05 arturo: poweroff `toolsbeta-services-01`. I suspect this VM is not in use because no puppet role is in used there
  • 10:58 arturo: run `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb` the service was in failed state, causing puppet failures across the whole project

2020-04-10

  • 19:32 bstorm_: deployed webservice 0.67 T249843
  • 18:59 bstorm_: delete toolsbeta-gitlab-01 and build toolsbeta-workflow-test T249946
  • 00:40 bd808: REbooting toolsbeta-sgebastion-04. NFS seemed messed up

2020-04-08

  • 01:10 bstorm_: upgrade toollabs-webservice to 0.66 for qa T249390

2020-03-31

  • 23:39 bstorm_: deployed toollabs-webservice-0.65 to toolsbeta

2020-03-30

  • 10:35 arturo: remove local changes in the puppet tree in toolsbeta-puppetmaster-03 (docker mount point)
  • 10:30 arturo: remove puppet prefixes `toolsbeta-test-proxy`, `toolsbeta-k8s-master`, `toolsbeta-flannel-etcd`, no longer in use

2020-03-24

  • 18:45 jeh: cleanup and remove toolsbeta-elastic7-[1,2,3] VMs (re-configuring hypervisor for local storage) T243327

2020-03-19

2020-03-16

2020-03-11

  • 21:32 bstorm_: deployed jobutils_1.39 and miscutils_1.39 to toolsbeta

2020-03-09

  • 13:11 arturo: created VM `toolsbeta-legacy-redirector` (T247236)
  • 13:08 arturo: instance quota was full, bump it from 35 to 40

2020-03-06

  • 16:22 bstorm_: updating maintain-kubeusers image to filter invalid tool names

2020-03-05

  • 21:22 bstorm_: updated maintain-kubeusers to the latest version for toolsbeta only to live test

2020-02-27

  • 19:19 bstorm_: upgraded toollabs-webservice to 0.64 on stretch-toolsbeta for testing
  • 16:03 jeh: create 3 new VMs toolsbeta-elastic7-0[1,2,3]
  • 16:00 jeh: increase CloudVPS quota instance count for new elasticsearch servers

2020-02-26

  • 20:35 bstorm_: hard rebooting the grid master for toolsbeta
  • 20:20 jeh: restart toolsbeta-sgegrid-shadow

2020-02-18

  • 23:20 bstorm_: added toolsbeta-sgegrid-master.toolsbeta.eqiad1.wikimedia.cloud and toolsbeta-sgegrid-shadow.toolsbeta.eqiad1.wikimedia.cloud to gridengine admin host lists

2020-02-10

2020-02-07

  • 23:07 bstorm_: upgraded toollabs-webservice for stetch toolsbeta to 0.60 T244611
  • 21:09 bstorm_: upgraded toollabs-webservice package for stretch toolsbeta to 0.59 T244293 T244289 T234617 T156626

2020-01-23

  • 03:14 bd808: Demoted projectadmins not listed in the "roots" sudoer policy to project members just to avoid random confusion
  • 03:06 bd808: Added legoktm to "roots" sudoer policy
  • 02:53 bd808: Added legoktm as project admin

2020-01-22

  • 11:59 arturo: remove toolviews scripts from toolsbeta-proxy-{1,2}, source of cronspam

2020-01-21

2020-01-17

2020-01-16

2020-01-14

  • 02:15 andrewbogott: rebooting toolsbeta-sgecron-01 and toolsbeta-test-k8s-etcd-3 to get nfs unstuch

2020-01-13

  • 16:41 bstorm_: There was a filesystem unclean and other problems on the "old cluster" worker node 1001. Rebooting it in case that helps.

2020-01-10

  • 21:05 bstorm_: updated toollabs-webservice package to 0.55 for testing

2020-01-07

  • 15:51 bstorm_: changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067

2020-01-06

  • 21:42 bstorm_: disabled rpcbind on toolsbeta-sgebastion-04 to test some things

2020-01-03

  • 17:46 bstorm_: stashed uncommitted changes on the puppetmaster because they seem to be things that are already merged
  • 11:27 arturo: [new k8s] cadvisor is running in the metrics namespace now (T237643)

2020-01-02

  • 22:37 bstorm_: Deleting the massive number of test ingresses for tool-fourohfour so the ingress controllers aren't moving so slowly.
  • 22:19 bstorm_: Changed the ingress-admission ValidatingWebhookConfiguration to check extensions as well as networking API groups

2019-12-17

  • 00:14 bstorm_: Fully enabled encryption at rest for toolsbeta kubernetes

2019-12-16

  • 23:03 bstorm_: updated the kubeadm-config configmap to match the new init file

2019-12-04

  • 13:02 arturo: drop puppet prefix `toolsbeta-grid-master`, deprecated and no longer in use
  • 12:50 arturo: drop puppet prefix `toolsbeta-bastion`, deprecated and no longer in use

2019-12-02

  • 10:38 arturo: create wildcard DNS record for `*.toolsbeta.wmflabs.org` for use by the new k8s cluster
  • 10:34 arturo: manually scale nginx-ingress deployment to 5 replicas (T239405)

2019-11-25

  • 10:30 arturo: add puppet cert SANs via hiera to toolsbeta-test-k8s-etcd nodes (T238655)

2019-11-21

  • 14:15 arturo: upgrade new k8s cluster to 1.15.6 using kubeadm (plus kubelet)

2019-11-15

  • 14:46 arturo: stop live-hacks on toolsbeta-test-k8s-haproxy-1 T237643

2019-11-14

  • 10:32 arturo: live-hacking toolsbeta-test-k8s-haproxy-1 to point to just the k8s apiserver in control-1 Turn on --v=10 in control-1 for extended debug

2019-11-08

  • 19:36 bstorm_: rebooted the proxy server just in case that fixes something.
  • 11:58 arturo: adding `profile::toolforge::bastion::nproc: 100` to puppet prefix `toolsbeta-sgebastion` (T236202)
  • 11:38 arturo: new k8s: refresh deployment for nginx-ingress with latest changes from puppet

2019-11-07

  • 21:55 bstorm_: killed pods for ingress admission controller to upgrade to new image T215531

2019-11-06

  • 22:39 bstorm_: upgraded repo version of toollabs-webservice in toolsbeta-stretch to 0.49 -- changes for the new k8s cluster T215531
  • 19:09 bstorm_: added profile::toolforge::proxies in global hiera to try and figure out why it won't let anything use redis T237443
  • 18:53 bstorm_: launching toolsbeta-proxy-2 on a hunch that the config doesn't work well as a standalone T237443
  • 18:46 bstorm_: rebooting toolsbeta-proxy-1 trying to convince redis it is not a read replica T237443
  • 18:29 bstorm_: stopped broken kube-proxy service on toolsbeta-proxy-1 (should probably be puppetized)
  • 17:35 bstorm_: changing some hiera to work with new proxy host
  • 12:44 arturo: created VM toolsbeta-proxy-1 (T237443)

2019-11-05

  • 22:50 bstorm_: deployed the new maintain-kubeusers to toolsbeta T215531 T228499

2019-10-25

  • 23:41 bstorm_: Deployed custom webhook controllers for registry and ingress checking to toolsbeta-test kubernetes cluster T215531 T215678 T234231
  • 16:15 bstorm_: rebooting toolsbeta-test-k8s-worker-1 and -2

2019-10-23

  • 12:04 arturo: created 2 new VMs `toolsbeta-test-k8s-worker-[1,2]` T236074
  • 11:56 arturo: point FQDN `k8s.toolsbeta.eqiad1.wikimedia.cloud` to `toolsbeta-test-k8s-haproxy-1` (T236074)
  • 11:20 arturo: re-create VM `toolsbeta-test-k8s-haproxy-1` to use new puppet profile (T236074)
  • 11:10 arturo: re-create VM `toolsbeta-test-k8s-haproxy-2` to test https://gerrit.wikimedia.org/r/545532 (T236074)

2019-10-22

  • 17:43 arturo: re-create VM `toolsbeta-test-k8s-control-1` T236074
  • 15:48 arturo: point DNS record `k8s.toolsbeta.eqiad1.wikimedia.cloud` to the first controller node for the bootstrap T236074
  • 15:30 arturo: created puppet prefix `toolsbeta-test-k8s-control` and delete `toolsbeta-test-k8s-master` T236074
  • 12:27 arturo: refreshed puppet prefix `toolsbeta-test-k8s-control` with latest info T236074
  • {{safesubst:SAL entry|1=12:26 arturo: created 3 VMs `toolsbeta-test-k8s-control-{1,2,3}` T236074}}
  • 12:15 arturo: refresh IP addr of FQDN `k8s.toolsbeta.eqiad1.wikimedia.cloud` T236074
  • 12:14 arturo: delete FQDN `toolsbeta-k8s-master.toolsbeta.wmflabs.org` T236074
  • {{safesubst:SAL entry|1=11:57 arturo: created 2 new VMS `toolsbeta-test-k8s-haproxy-{1,2}` T236074}}
  • 11:54 arturo: created puppet prefix `toolsbeta-test-k8s-haproxy` and delete `toolsbeta-test-k8s-lb` T236074

2019-10-21

  • 15:13 arturo: refresh config in prefix puppet `toolsbeta-test-k8s-etcd` to account for new servers T236074
  • {{safesubst:SAL entry|1=15:07 arturo: create 3 VMs toolsbeta-test-k8s-etcd-{1,2,3} T236074}}
  • 14:58 arturo: deleting all toolsbeta-test-* VMs (master, worker, etcd, lb) T236074

2019-10-18

  • 16:33 arturo: created DNS zone `toolsbeta.eqiad1.wikimedia.cloud`
  • 09:06 arturo: remove puppet prefix toolsbeta-valhallasw-puppet-compiler (unused)
  • {{safesubst:SAL entry|1=09:00 arturo: remove puppet prefix toolsbeta-arturo-k8s-{etcd,master,worker} (unused)}}
  • {{safesubst:SAL entry|1=08:59 arturo: refresh role for servers in toolsbeta-test-k8s-{master,worker}}}
  • 08:58 arturo: remove puppet prefix etcd-k8s-ctest (unused)

2019-10-14

  • 12:26 arturo: delete VM `toolsbeta-test-proxy-01` no longer required
  • 12:26 arturo: created security group arturo-test-dynamicproxy-backend to tests stuff related to T234037

2019-10-09

  • 11:59 arturo: re-create toolsbeta-test-proxy-01 as Debian Buster (T235059)

2019-10-08

  • 14:14 arturo: created puppet prefix `toolsbeta-test-proxy` for testing stuff related to T234037
  • 12:27 arturo: created VM toolsbeta-test-proxy-01 for testing stuff related to T234037

2019-10-07

  • 19:12 Krenair: reboot toolsbeta-sgecron-01 toolsbeta-sgewebgrid-generic-0901 toolsbeta-sgewebgrid-lighttpd-0901 due to nfs stale issue

2019-09-25

  • 23:31 bd808: Updated user list for "roots" sudoer policy
  • 23:30 bd808: Granted Krenair projectadmin

2019-09-05

  • {{safesubst:SAL entry|1=15:08 zhuyifei1999_: `sudo truncate -s 0 /var/log/exim4/paniclog` on toolsbeta-{sgewebgrid-{lighttpd,generic}-0901,sgecron-01}.toolsbeta.eqiad.wmflabs because of email spam}}

2019-08-12

  • 20:40 phamhi: toolsbeta-test-puppet-sandbox instance created for T230147

2019-08-09

  • 10:51 arturo: rebalance load: reallocating toolsbeta-sgewebgrid-lighttpd-0901 from cloudvirt1018 to cloudvirt1003

2019-07-24

  • 20:48 bstorm_: rebuilt toolsbeta-test cluster with the internal version of the pause container T228887 T215531
  • 19:02 bstorm_: doing a clean rebuild of the toolsbeta-test-k8s cluster

2019-07-18

  • 16:04 arturo: re-create VMs toolsbeta-test-k8s-{master,worker}-*
  • 12:47 arturo: create toolsbeta-test-k8s-etcd-2 as buster to check status of latest puppet code (T226098)
  • 12:00 arturo: create toolsbeta-test-k8s-worker-2 as buster to check status of latest puppet code
  • {{safesubst:SAL entry|1=09:28 arturo: re-create toolsbeta-test-k8s-master-{1,2,3} as buster to test T228267}}

2019-07-17

  • 09:51 arturo: re-create VM toolsbeta-test-k8s-worker-1 as Debian Buster T215531
  • 09:13 arturo: create VM toolsbeta-test-k8s-master-4 (Debian Buster) T215531

2019-07-15

  • 12:29 arturo: create `toolsbeta-test-k8s-etcd` puppet prefix
  • 12:27 arturo: create `toolsbeta-test-k8s-etcd-1` VM T215531

2019-07-03

  • 10:49 arturo: recreate `toolsbeta-test-k8s-master-1` VM (T215531)
  • 09:32 arturo: create `toolsbeta-test-k8s-worker-1` VM and a puppet prefix for it (T215531)
  • 09:22 arturo: delete all `toolsbeta-arturo-k8s-*` instances. We no longer require them per new approach at T215531

2019-07-02

  • 17:24 arturo: `aborrero@toolsbeta-test-k8s-lb-01:~ $ sudo generate_haproxy_default.sh` (T215531)
  • 10:32 arturo: re-creating toolsbeta-test-k8s-master-1 (T215531) for it to be created without swap

2019-07-01

  • 17:13 arturo: re-creating instance `toolsbeta-test-k8s-master-1` with more CPU for T215531
  • 17:03 arturo: updated FQDN `toolsbeta-k8s-master.toolsbeta.wmflabs.org` with 172.16.6.9 (the new LB VM) for T215531
  • 17:02 arturo: re-creating instance `toolsbeta-test-k8s-lb-01` with more CPU for T215531
  • 16:58 arturo: add puppet prefix `toolsbeta-test-k8s-lb` for T215531
  • 11:50 arturo: add sssd hiera config for `toolsbeta-test-k8s-master` prefix

2019-06-28

  • 19:10 bstorm_: T215531 removed toolsbeta-arturo-k8s-master-2/3 and added toolsbeta-test-k8s-master-1 for testing kubeadm

2019-06-25

  • 10:35 arturo: create puppet prefix `toolsbeta-arturo-k8s-worker` for T215531
  • 10:35 arturo: create 2 VMs toolsbeta-arturo-k8s-worker-[1,2] for T215531

2019-06-21

  • 11:42 arturo: re-create 3 VMs toolsbeta-arturo-k8s-etcd-[1-3] to test latest puppet code in T226098

2019-06-19

  • 10:39 arturo: add myself to the `toolsbeta.admin` LDAP group (T225303)

2019-06-14

  • 16:24 bstorm_: Manually failed "back" to the toolsbeta-sgegrid-master to get the grid functioning again in toolsbeta
  • 16:03 bstorm_: T221721 hard rebooted toolsbeta-sgegrid-master because it had oomkilled basically everything
  • 15:55 bstorm_: T221721 deleted toolsbeta-proxy-01 until it can be actively worked on.
  • 15:51 bstorm_: deleted toolsbeta-k8s-lb-01 since it isn't being actively worked on just now

2019-06-06

  • 12:14 arturo: T215531 create 3 VMs `toolsbeta-arturo-k8s-etcd-[1-3]`
  • 12:13 arturo: T215531 add `toolsbeta-arturo-k8s-etcd`* puppet prefix
  • 12:12 arturo: T215531 add `toolsbeta-arturo-k8s-test` puppet prefix

2019-06-05

  • 12:40 arturo: rebase git repos in toolsbeta-puppetmaster-02. There was some rebase problems in labs/private that required me re-creating by hand one of the [local] patches (puppetdb secrets)
  • 12:33 arturo: drop VM instances toolsbeta-k8s-master-arturo-[1-3] and create toolsbeta-arturo-k8s-master-[1-3] T215531
  • 12:32 arturo: drop puppet prefix `toolsbeta-k8s-master-arturo` and create `toolsbeta-arturo-k8s-master` since there is also `toolsbeta-k8s-master` which get applied to my VMs T215531
  • 11:42 arturo: create VM `toolsbeta-k8s-master-arturo-3` for T215531 (so I have 3 master nodes in this k8s deployment)
  • 11:38 arturo: delete instances arturo-sgeexec-sssd-test-2, arturo-sgeexec-sssd-test-1, arturo-bastion-sssd-test, unused

2019-05-24

  • 11:49 arturo: T224273 create `toolsbeta-k8s-master-arturo` puppet prefix in horizon
  • 11:45 arturo: T224273 create toolsbeta-k8s-master-arturo-[12] stretch VMs
  • 11:17 arturo: install by hand some openstack client packages that puppet would refuse to install in toolsbeta-k8s-master-01
  • 11:12 arturo: mangle sources.list to handle some apt warnings related to missing repos, etc in toolsbeta-k8s-master-01:
  • 11:12 arturo: mangle sources.list to handle some apt warnings related to missing repos, etc

2019-05-07

  • 10:22 arturo: T219362 drop the `toolsbeta-exec` puppet prefix
  • 10:20 arturo: T219362 drop the `toolsbeta-webgrid-generic` puppet prefix
  • 10:19 arturo: T219362 drop the `toolsbeta-webgrid-lighttpd` puppet prefix

2019-04-25

  • 04:17 andrewbogott: edited resolv.conf on unpuppetized instances to use the new nameserver: toolsbeta-docker-registry-01, toolsbeta-k8s-lb-01, toolsbeta-proxy-01, toolsbeta-puppetdb-01, toolsbeta-sgegrid-master

2019-04-12

  • 23:34 mutante: - toolsbeta-k8s-master-01 - was out of disk space on / , puppet failed to run because out of disk, rename existing syslog.1.gz, gzip syslog.1, rename existing daemon.log.1.gz, gzip daemong.log.1
  • 00:05 andrewbogott: migrating remaining VMs to eqiad1-r

2019-03-25

  • 18:00 bd808: All Trusty instances shutdown and now in process of deleting
  • 17:42 bd808: Preparing to shutdown beta Trusty job grid

2019-03-22

  • 13:59 arturo: create VMs arturo-sgeexec-sssd-test-[12] for testing T218126

2019-03-15

  • 10:23 arturo: create VM `arturo-bastion-sssd-test` (T218126)

2019-02-20

  • 14:58 andrewbogott: moving toolsbeta-grid-master and toolsbeta-puppetmaster-02 to labvirt1003

2019-02-14

  • 18:30 andrewbogott: moving toolsbeta-puppetdb-01 to labvirt1002

2018-12-04

2018-11-26

  • 13:26 arturo: T210098 VM=toolsbeta-sgebastion-03
  • 13:25 arturo: T210098 install systemd239 from stretch-backports and restart VM

2018-11-08

  • 10:01 arturo: make myself projectadmin to test toolforge stuff on stretch (specifically T207970)

2018-10-22

  • 21:20 bstorm_: launched a stretch/sonofgridengine master server

2018-09-19

  • 20:11 bstorm_: toolsbeta-puppetmaster-02 is now the puppetmaster and puppetdb works for toolsbeta -- T200557
  • 17:24 bstorm_: new puppetmaster is toolsbeta-puppetmaster-02, however, manual changes are required on each client, so it will be broken for a bit (enabling puppetdb for T200557)
  • 17:06 bstorm_: working on replacing puppetmaster with one running stretch, as part of adding puppetdb

2018-07-22

  • 14:28 zhuyifei1999_: backed up Neha16's changes to toolsbeta-bastion-01:/usr/lib/python2.7/dist-packages/toollabs to toollabs.bak in the same dir via cp -a, and re-install webservice code on the bastion to debug T156626

2018-07-18

  • 10:46 harej: Deleted toolsbeta-flynn-01

2018-07-12

  • 23:06 bstorm_: Got the grid master running

2018-06-28

  • 16:34 chicocvenancio: adding harej as root for flynn testing

2018-06-27

  • 22:35 chicocvenancio: add harej as project admin to test Flynn stuff

2018-06-22

  • 22:26 chicocvenancio: reconfigured toolsbeta-paws-master-01 kubelet to test image pruning
  • 09:39 zhuyifei1999_: fixed that by running `sudo mv /var/lib/puppet/ssl /var/lib/puppet/ssl.bak` then following the red instructions
  • 09:33 zhuyifei1999_: puppet is broken on toolsbeta-bastion-01, investigating
  • 09:03 zhuyifei1999_: killing and rebuilding toolsbeta-bastion-01
  • 08:31 zhuyifei1999_: on toolsbeta-bastion-01, killed /etc/apt/sources.list.d/jonathonf-python-2_7-trusty.list ppa, downgraded python from 2.7.14 to 2.7.5, and reinstalled toollabs-webservice
  • 07:56 andrewbogott: someone removed /usr/bin/webservice

2018-05-15

  • 07:26 zhuyifei1999_: applied 5324236 via toolsbeta-puppetmaster-01 T190893
  • 05:28 zhuyifei1999_: Making project puppetmaster at toolsbeta-puppetmaster-01

2018-05-08

  • 02:18 zhuyifei1999_: manually created flannel etcd key T190893

2018-05-07

  • 19:01 zhuyifei1999_: install kubernetes-client on toolsbeta-worker-1001 to debug stuffs
  • 18:41 zhuyifei1999_: rebuilding toolsbeta-k8s-etcd-01
  • 17:58 zhuyifei1999_: cleanup from maintain-kubeusers using the wrong project to create tool home dirs: `find /data/project/ -mindepth 1 -maxdepth 1 -type d \! -user 0 | (while read dir; do id toolsbeta.`basename $dir` 2> /dev/null || sudo rm -rfv $dir; done)`
  • 16:41 zhuyifei1999_: rebuild toolsbeta-k8s-master-01 because I can't figure out why puppet can't update maintain-kubeusers.systemd

2018-05-06

  • 04:06 zhuyifei1999_: locally patched `/usr/lib/python2.7/dist-packages/toollabs/common/tool.py` on bastion and webgrid-lighttpd

2018-05-05

  • 19:51 zhuyifei1999_: `systemctl mask maintain-kubeusers` because it's causing a mess, tries to get the tool list from toolforge T190893
  • 18:40 zhuyifei1999_: to unblock k8s testing while waiting on https://gerrit.wikimedia.org/r/430539, installed the package directly on `toolsbeta-k8s-master-01` with `$ sudo apt install python3-yaml`

2018-05-02

  • 21:02 zhuyifei1999_: copy over labs/private:/hieradata/labs/tools/common.yaml to project puppet hiera
  • 20:37 bd808: Added Neha16 as a project admin for work on T175768
  • 20:31 zhuyifei1999_: nuke webservice instances and rebuild
  • 20:31 zhuyifei1999_: Added k8s_infrastructure_users to project hiera on horizon T192618

2018-04-20

  • 00:20 zhuyifei1999_: deleted all instances I just created except k8s master because of chicken-and-egg problem

2018-04-19

  • 22:10 zhuyifei1999_: the trusty instances ask me for my password. the jessie instances don't like my ssh key. :(
  • 21:59 zhuyifei1999_: got 'Error: RecordSet belongs in a child zone: toolsbeta.wmflabs.org', using tools-beta.wmflabs.org instead
  • 21:57 zhuyifei1999_: Add proxy toolsbeta.wmflabs.org => toolsbeta-proxy-01.toolsbeta.eqiad.wmflabs
  • 21:43 zhuyifei1999_: Start creating instances for webservice setup T190893

2018-03-30

  • 22:40 zhuyifei1999_: copied over many prefix puppet configuration in horizon from toolforge T190893

2018-03-14

  • 18:07 chicocvenancio: updated paws-beta k8s cluster and nodes to v1.9.4 for T189680

2018-03-05

  • 19:33 chicocvenancio: added Zhuyifei1999 as project admin

2018-02-09

  • 01:11 bd808: Removed Yuvipanda at user request (T186289)

2017-08-07

  • 14:09 andrewbogott: deleted etcd-k8s-CTEST and k8s-master-CTEST

2017-04-26

  • 15:38 madhuvishy: add Madhuvishy as projectadmin

2016-10-07

  • 19:30 valhallasw`cloud: (puppet certs, to be precise)
  • 19:30 valhallasw`cloud: fixed certs on toolsbeta-vagrant3-scfc.toolsbeta.eqiad.wmflabs

2016-10-04

  • 19:31 valhallasw`cloud: puppet is broken due to incorrect certificates. Cleaning up ('puppet cert clean toolsbeta-webgrid-lighttpd-1406.toolsbeta.eqiad.wmflabs' on puppetmaster3, 'rm -f /var/lib/puppet/client/ssl/certs/toolsbeta-webgrid-lighttpd-1406.toolsbeta.eqiad.wmflabs.pem' on host, for all hosts that I got emails for)

2016-09-08

  • 17:11 bd808: Added BryanDavis (self) to project as admin

2016-08-29

  • 19:20 yuvipanda: reboot toolsbeta-master, seems, uh, stuck
  • 19:18 yuvipanda: reboot toolsbeta-mail, seems, uh, stuck
  • 18:48 yuvipanda: reboot toolsbeta-puppetmaster3, puppet run process became Zommmmbiiiieeee, ate all my brains

2016-07-03

  • 15:02 yuvipanda: migrating toolsbeta-valhallasw-puppet-compiler to labvirt1011 to ease pressure on labvirt1010

2016-05-27

  • 18:57 valhallasw`cloud: sudo qconf -Ae /var/lib/gridengine/etc/exechosts/toolsbeta-exec-1209.toolsbeta.eqiad.wmflabs

2016-05-26

  • 15:08 valhallasw`cloud: toolsbeta-mail has high load (1.0) without clear origin, so rebooting the host

2015-10-13

  • 19:21 valhallasw`cloud: started building toolsbeta-bastion.

2015-09-07

  • 18:50 valhallasw`cloud: role::bastion is now applied on -exec-101. Now for the package_builder manifest...
  • 18:30 valhallasw`cloud: applied role::toollabs::bastion on toolsbeta-exec-101 (spinning up a whole new instance will take ages)

July 4

  • 12:57 valhallasw`cloud: restarting toolsbeta-webproxy, no response on port 22

July 2

  • 14:55 valhallasw`cloud: toolsbeta-webproxy does not respond at all to SSH; rebooting

July 1

  • 19:47 valhallasw`cloud: still can't login :/ not sure if this is a remainder of the NFS failure or something else; maybe a puppet run will solve it?
  • 19:44 valhallasw`cloud: restarting toolsbeta-exec-01 and toolsbeta-mail as I can't login

June 7

  • 14:44 valhallasw: updated /var/lib/git/operations/puppet to make sure the other hosts get the memo
  • 14:42 YuviPanda: run sudo sed -i 's/GlobalSign_CA.pem/ca-certificates.crt/' /etc/ldap/ldap.conf on toolsbeta-puppetmaster3 to fix broken LDAP TLS config

May 11

  • 18:14 valhallasw: building toolsbeta-pbuilder to experiment with pbuilder for building packages

May 2

  • 11:11 valhallasw`cloud: commenting out include ::elasticsearch::ganglia in role::logstash seems to work. I think we have to write our own tools logstash roles anyway in the end, as the role::logstash code contains e.g. mediawiki specific code
  • 10:37 valhallasw`cloud: that doesn't seem to be applied... setting has_ganglia: false manually in wikitech hiera
  • 10:30 valhallasw`cloud: pulled new changes into puppetmaster to get https://github.com/wikimedia/operations-puppet/commit/4afd23d8e2905a84ef211ad92e8314173eb743ba in
  • 10:25 valhallasw`cloud: set Hiera variable "elasticsearch::cluster_name": toolsbeta-logstash-eqiad
  • 10:09 valhallasw`cloud: created toolsbeta-logstash to play around with logstash and figure out what we need for tools (phab:T97861)

April 26

March 31

  • 00:27 andrewbogott: shut down toolsbeta-webgrid-03 to conserve resources. It can be restarted when needed.

September 20

  • 20:09 andrewbogott_afk: moved toolsbeta-exec-01 and toolsbeta-scfc-icinga-test off of virt1006

July 22

  • 11:36 scfc_de: Removed andrewbogott_afk, Coren, petan, YuviPanda from service group admin to prevent further spamming :-)

August 19

  • 12:44 petan: rebooting apache it seems to be frozen

August 4

  • 23:50 scfc_de: Added scfc_de to local-admin so I don't log myself out again :-)

July 6

  • 19:42 petan: rebooting login

June 26

  • 08:03 wm-bot: petrb: updating logsplitter

June 24

  • 14:47 wm-bot: petrb: rebooting exec-01 to fix the grid weird info
  • 13:43 scfc_de: Made scfc root.
  • 13:42 scfc_de: Created toolsbeta-puppetmaster.
  • 11:09 YuviPanda: Granted yuvipanda root on toolsbeta

June 21

  • 13:46 wm-bot: petrb: rebooting all servers

June 17

  • 08:31 petan: switching all instances to nfs

June 16

  • 15:37 petan: importing sudo policies of tools
  • 15:36 petan: importing security groups of tools
  • 15:36 petan: blah