You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Nova Resource:Tools/SAL: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (Majavah: clear error states from all currently erroring exec nodes) |
imported>Stashbot (wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api (7e57832) (T337218) - cookbook ran by dcaro@vulcanus) |
||
(259 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
=== | === 2023-06-01 === | ||
* | * 10:07 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|7e57832}}) ([[phab:T337218|T337218]]) - cookbook ran by dcaro@vulcanus | ||
* | * 09:21 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|0f4076a}}) ([[phab:T336130|T336130]]) - cookbook ran by dcaro@vulcanus | ||
* | * 09:18 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpack-admission-controller ({{Gerrit|ef7f103}}) ([[phab:T336130|T336130]]) - cookbook ran by dcaro@vulcanus | ||
* | * 07:52 dcaro: rebooted tools-package-builder-04 (stuck not letting me log in with my user) | ||
=== | === 2023-05-31 === | ||
* | * 02:38 andrewbogott: rebooted tools-sgeweblight-10-16, [[phab:T337806|T337806]] | ||
=== | === 2023-05-30 === | ||
* | * 00:22 andrewbogott: rebooted tools-sgeweblight-10-30, oom | ||
* | * 00:16 andrewbogott: rebooted tools-sgeweblight-10-24, seems to be oom | ||
=== | === 2023-05-26 === | ||
* | * 13:13 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpack-admission-controller ({{Gerrit|ef7f103}}) ([[phab:T337218|T337218]]) - cookbook ran by dcaro@vulcanus | ||
* | * 12:59 dcaro: rebooting tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud for stale NFS handles (D processes) | ||
=== | === 2023-05-24 === | ||
* | * 12:28 dcaro: deploy latest buildservice ([[phab:T335865|T335865]]) | ||
* | * 12:28 dcaro: deploy latest buildservice ([[phab:T336050|T336050]]) | ||
=== | === 2023-05-23 === | ||
* | * 14:40 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|0c7b25b}}) - cookbook ran by fran@wmf3169 | ||
=== | === 2023-05-22 === | ||
* | * 10:06 arturo: hard-reboot tools-sgeexec-10-18 (monitoring reporting it as down) | ||
=== | === 2023-05-19 === | ||
* | * 13:38 arturo: uncordon tools-k8s-worker-47/48/64/75 | ||
* 08: | * 08:46 bd808: Building new perl532-sssd/<nowiki>{</nowiki>base,web<nowiki>}</nowiki> images ([[phab:T323522|T323522]], [[phab:T320904|T320904]]) | ||
=== | === 2023-05-17 === | ||
* | * 16:05 dcaro: release toolforge-cli 0.3.0 ([[phab:T336225|T336225]]) | ||
* | * 12:48 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|fa8ed2c}}) ([[phab:T336225|T336225]]) - cookbook ran by dcaro@vulcanus | ||
* | * 12:48 wm-bot2: rebooted k8s node tools-k8s-worker-71 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | ||
* | * 12:45 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|d1bb238}}) ([[phab:T336225|T336225]]) - cookbook ran by dcaro@vulcanus | ||
* | * 12:43 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|8d21314}}) - cookbook ran by dcaro@vulcanus | ||
* | * 10:54 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:7199a9e from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|7199a9e}}) - cookbook ran by fran@wmf3169 | ||
* 08:49 wm-bot2: rebooted k8s node tools-k8s-worker-55 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:33 wm-bot2: rebooted k8s node tools-k8s-worker-64 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:32 wm-bot2: rebooted k8s node tools-k8s-worker-75 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:25 wm-bot2: rebooted k8s node tools-k8s-worker-74 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:17 wm-bot2: rebooted k8s node tools-k8s-worker-61 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:10 wm-bot2: rebooted k8s node tools-k8s-worker-70 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 08:03 wm-bot2: rebooted k8s node tools-k8s-worker-66 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 07:54 wm-bot2: rebooted k8s node tools-k8s-worker-72 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 07:46 wm-bot2: rebooted k8s node tools-k8s-worker-47 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 07:45 wm-bot2: rebooted k8s node tools-k8s-worker-48 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 07:42 wm-bot2: rebooted k8s node tools-k8s-worker-69 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 07:29 wm-bot2: rebooted k8s node tools-k8s-worker-76 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
=== | === 2023-05-16 === | ||
* | * 23:24 bd808: kubectl uncordon tools-k8s-worker-69 | ||
* | * 23:22 bd808: Force reboot tools-k8s-worker-69 via Horizon | ||
* 23:18 bd808: kubectl drain --ignore-daemonsets --delete-emptydir-data --force tools-k8s-worker-69 | |||
* 23:17 bd808: kubectl cordon tools-k8s-worker-69 | |||
* 14:37 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/builds-api:35b57c6 from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api.git ({{Gerrit|35b57c6}}) - cookbook ran by dcaro@vulcanus | |||
* 13:05 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|df52a39}}) ([[phab:T334081|T334081]]) - cookbook ran by dcaro@vulcanus | |||
* 12:54 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|ad5b2b5}}) ([[phab:T334081|T334081]]) - cookbook ran by dcaro@vulcanus | |||
* 11:52 dcaro: release toolforge-weld 0.2.0 and toolforge-webservice 0.98 | |||
* 08:08 dcaro: reboot tools-mail-03 ([[phab:T316544|T316544]]) | |||
* 08:07 dcaro: reboot tools-sgebastion-10 ([[phab:T316544|T316544]]) | |||
=== | === 2023-05-15 === | ||
* | * 22:50 bd808: Rebuilding bullseye and buster docker containers to pick up make package addition ([[phab:T320343|T320343]]) | ||
* 14:38 dcaro: | * 22:09 wm-bot2: rebooted k8s node tools-k8s-worker-66 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | ||
* | * 22:07 wm-bot2: rebooted k8s node tools-k8s-worker-65 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | ||
* 22:06 wm-bot2: rebooted k8s node tools-k8s-worker-64 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 22:04 wm-bot2: rebooted k8s node tools-k8s-worker-62 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 22:02 wm-bot2: rebooted k8s node tools-k8s-worker-61 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:58 wm-bot2: rebooted k8s node tools-k8s-worker-60 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:56 wm-bot2: rebooted k8s node tools-k8s-worker-59 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:54 wm-bot2: rebooted k8s node tools-k8s-worker-58 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:52 wm-bot2: rebooted k8s node tools-k8s-worker-57 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:51 wm-bot2: rebooted k8s node tools-k8s-worker-56 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:50 wm-bot2: rebooted k8s node tools-k8s-worker-55 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:49 wm-bot2: rebooted k8s node tools-k8s-worker-54 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:47 wm-bot2: rebooted k8s node tools-k8s-worker-53 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:44 wm-bot2: rebooted k8s node tools-k8s-worker-52 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:42 wm-bot2: rebooted k8s node tools-k8s-worker-51 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:41 wm-bot2: rebooted k8s node tools-k8s-worker-50 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:40 wm-bot2: rebooted k8s node tools-k8s-worker-49 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:38 wm-bot2: rebooted k8s node tools-k8s-worker-48 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:37 wm-bot2: rebooted k8s node tools-k8s-worker-47 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye | |||
* 21:33 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by andrew@bullseye | |||
* 21:16 wm-bot2: rebooted k8s node tools-k8s-worker-45 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 21:15 wm-bot2: rebooted k8s node tools-k8s-worker-44 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 21:13 wm-bot2: rebooted k8s node tools-k8s-worker-43 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 21:12 wm-bot2: rebooted k8s node tools-k8s-worker-42 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 21:09 wm-bot2: rebooted k8s node tools-k8s-worker-41 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 21:03 wm-bot2: rebooted k8s node tools-k8s-worker-40 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:58 wm-bot2: rebooted k8s node tools-k8s-worker-39 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:52 wm-bot2: rebooted k8s node tools-k8s-worker-38 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:50 wm-bot2: rebooted k8s node tools-k8s-worker-37 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:49 wm-bot2: rebooted k8s node tools-k8s-worker-36 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:48 wm-bot2: rebooted k8s node tools-k8s-worker-35 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:47 wm-bot2: rebooted k8s node tools-k8s-worker-34 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:42 wm-bot2: rebooted k8s node tools-k8s-worker-33 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:41 andrewbogott: rebooting frozen VMs: tools-k8s-worker-65, tools-sgeweblight-10-27, tools-k8s-worker-45, tools-k8s-worker-36, tools-sgewebgen-10-3 (fallout from earlier nfs outage) | |||
* 20:36 wm-bot2: rebooted k8s node tools-k8s-worker-32 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:32 wm-bot2: rebooted k8s node tools-k8s-worker-31 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 20:24 wm-bot2: rebooted k8s node tools-k8s-worker-30 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 19:04 wm-bot2: rebooted k8s node tools-k8s-worker-67 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:56 wm-bot2: rebooted k8s node tools-k8s-worker-68 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:49 wm-bot2: rebooted k8s node tools-k8s-worker-69 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:46 bd808: Hard reboot tools-static-14 via Horizon per IRC report of unresponsive requests | |||
* 18:44 wm-bot2: rebooted k8s node tools-k8s-worker-70 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:42 wm-bot2: rebooted k8s node tools-k8s-worker-71 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:39 wm-bot2: rebooted k8s node tools-k8s-worker-72 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:34 wm-bot2: rebooted k8s node tools-k8s-worker-73 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:28 wm-bot2: rebooted k8s node tools-k8s-worker-74 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:22 wm-bot2: rebooted k8s node tools-k8s-worker-75 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:22 taavi: clear mail queue | |||
* 18:21 wm-bot2: rebooted k8s node tools-k8s-worker-76 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:15 wm-bot2: rebooted k8s node tools-k8s-worker-77 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:08 wm-bot2: rebooted k8s node tools-k8s-worker-80 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:06 wm-bot2: rebooted k8s node tools-k8s-worker-81 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 18:05 wm-bot2: rebooted k8s node tools-k8s-worker-82 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:57 wm-bot2: rebooted k8s node tools-k8s-worker-83 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:48 wm-bot2: rebooted k8s node tools-k8s-worker-84 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:47 wm-bot2: rebooted k8s node tools-k8s-worker-85 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:38 wm-bot2: rebooted k8s node tools-k8s-worker-86 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:37 wm-bot2: rebooted k8s node tools-k8s-worker-87 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:35 wm-bot2: rebooted k8s node tools-k8s-worker-88 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:34 wm-bot2: rebooting all the workers of tools k8s cluster (64 nodes) ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:20 wm-bot2: rebooted k8s node tools-k8s-worker-87 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:19 wm-bot2: rebooted k8s node tools-k8s-worker-88 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:17 bd808: Rebuilding bullseye and buster docker containers to pick up openssh-client package addition ([[phab:T258841|T258841]]) | |||
* 17:12 wm-bot2: rebooting the whole tools k8s cluster (64 nodes) ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus | |||
* 17:06 dcaro: rebooting tools-sgegrid-shadow ([[phab:T316544|T316544]]) | |||
* 17:00 dcaro: rebooting tools-sgegrid-master ([[phab:T316544|T316544]]) | |||
* 16:55 dcaro: rebooting tools-sgeexec-10-20 ([[phab:T316544|T316544]]) | |||
* 16:53 dcaro: rebooting tools-sgeweblight-10-18 ([[phab:T316544|T316544]]) | |||
* 16:53 dcaro: rebooting tools-sgeweblight-10-25 ([[phab:T316544|T316544]]) | |||
* 16:53 dcaro: rebooting tools-sgeweblight-10-20 ([[phab:T316544|T316544]]) | |||
* 16:52 dcaro: rebooting tools-sgeweblight-10-21 ([[phab:T316544|T316544]]) | |||
* 16:52 dcaro: rebooting tools-sgeexec-10-22 ([[phab:T316544|T316544]]) | |||
* 16:51 dcaro: rebooting tools-sgeweblight-10-28 ([[phab:T316544|T316544]]) | |||
* 16:50 dcaro: rebooting tools-sgeexec-10-17 ([[phab:T316544|T316544]]) | |||
* 16:48 dcaro: rebooting tools-sgeexec-10-21 ([[phab:T316544|T316544]]) | |||
* 16:47 dcaro: rebooting tools-sgeexec-10-19 ([[phab:T316544|T316544]]) | |||
* 16:45 dcaro: rebooting tools-sgeexec-10-8 ([[phab:T316544|T316544]]) | |||
* 16:45 dcaro: rebooting tools-sgeweblight-10-24 ([[phab:T316544|T316544]]) | |||
* 16:44 dcaro: rebooting tools-sgewebgen-10-2 ([[phab:T316544|T316544]]) | |||
* 16:44 dcaro: rebooting tools-sgeweblight-10-16 ([[phab:T316544|T316544]]) | |||
* 16:43 dcaro: rebooting tools-sgeweblight-10-30 ([[phab:T316544|T316544]]) | |||
* 16:43 dcaro: rebooting tools-sgeexec-10-18 ([[phab:T316544|T316544]]) | |||
* 16:42 dcaro: rebooting tools-sgeexec-10-16 ([[phab:T316544|T316544]]) | |||
* 16:42 dcaro: rebooting tools-sgeexec-10-14 ([[phab:T316544|T316544]]) | |||
* 16:41 dcaro: rebooting tools-sgeweblight-10-32 ([[phab:T316544|T316544]]) | |||
* 16:40 dcaro: rebooting tools-sgeweblight-10-22 ([[phab:T316544|T316544]]) | |||
* 16:39 dcaro: rebooting tools-sgeweblight-10-17 ([[phab:T316544|T316544]]) | |||
* 16:32 dcaro: rebooting tools-sgeexec-10-13.tools.eqiad1.wikimedia.cloud ([[phab:T316544|T316544]]) | |||
* 16:23 dcaro: rebooting tools-sgeweblight-10-26 ([[phab:T316544|T316544]]) | |||
* 16:15 bd808: Hard reboot of tools-sgebastion-11 via Horizon (done circa 16:11Z) | |||
* 16:14 arturo: rebooted a bunch of nodes to cleanup D procs and high load avg because NFS outage (result of [[phab:T316544|T316544]]) | |||
* 12:36 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/builds-api:09f3b49-dev from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api.git ({{Gerrit|32a8ae9}}) - cookbook ran by dcaro@vulcanus | |||
* 09:12 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:c64da5a from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|c64da5a}}) - cookbook ran by dcaro@vulcanus | |||
=== | === 2023-05-13 === | ||
* | * 09:13 taavi: reboot tools-sgeexec-10-15,17,18,21 | ||
=== | === 2023-05-11 === | ||
* | * 15:48 bd808: Rebooted tools-sgebastion-10 for [[phab:T336510|T336510]] | ||
* 15:31 bd808: Sent `wall` for reboot of tools-sgebastion-10 circa 15:40Z | |||
* | |||
=== | === 2023-05-09 === | ||
* 16: | * 16:36 taavi: delegated beta.toolforge.org domain to toolsbeta per [[phab:T257386|T257386]] | ||
* 09:35 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|ad4fa2a}}) - cookbook ran by taavi@runko | |||
=== | === 2023-05-08 === | ||
* 09:12 arturo: force-reboot tools-sgeexec-10-13 (reported as down by the monitoring, no SSH) | |||
* 09: | |||
=== | === 2023-05-07 === | ||
* | * 16:06 taavi: remove inbound 25/tcp rule from the toolserver legacy server [[phab:T136225|T136225]] | ||
=== | === 2023-05-05 === | ||
* | * 22:21 bd808: Added "RepoLookoutBot" to hiera key "dynamicproxy::blocked_user_agent_regex" to stop unnecessary scans by https://www.repo-lookout.org/ | ||
* 22:20 bd808: Added | |||
* | * 11:30 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:811164e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|811164e}}) - cookbook ran by taavi@runko | ||
* 11: | * 09:13 dcaro: rebooted tools-sgeexec-10-16 as it was stuck ([[phab:T335009|T335009]]) | ||
* 09: | |||
=== | === 2023-05-04 === | ||
* | * 15:15 wm-bot2: removed instance tools-k8s-etcd-15 - cookbook ran by andrew@bullseye | ||
* | * 14:13 wm-bot2: removed instance tools-k8s-etcd-14 - cookbook ran by andrew@bullseye | ||
=== | === 2023-05-03 === | ||
* | * 12:41 wm-bot2: removed instance tools-k8s-etcd-13 - cookbook ran by andrew@bullseye | ||
=== | === 2023-05-02 === | ||
* | * 00:29 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|7199a9e}}) - cookbook ran by raymond@ubuntu | ||
=== | === 2023-05-01 === | ||
* | * 23:17 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:3b3803f from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|3b3803f}}) - cookbook ran by raymond@ubuntu | ||
=== | === 2023-04-28 === | ||
* | * 15:01 arturo: force reboot tools-k8s-worker-79, unresponsive | ||
* 08:27 dcaro: rebooting tools-sgeweblight-10-28 ([[phab:T335336|T335336]]) | |||
* 07:20 dcaro: rebooting tools-sgegrid-shadow due to stale nfs mount | |||
* 00:09 bd808: `kubectl uncordon tools-k8s-worker-67` ([[phab:T335543|T335543]]) | |||
* 00:07 bd808: Hard reboot tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud via horizon ([[phab:T335543|T335543]]) | |||
* 00:04 bd808: Rebooting tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud ([[phab:T335543|T335543]]) | |||
=== | === 2023-04-27 === | ||
* | * 23:59 bd808: `kubectl drain --ignore-daemonsets --delete-emptydir-data --force tools-k8s-worker-67` ([[phab:T335543|T335543]]) | ||
* 20:50 bd808: Started process to rebuild all buster and bullseye based container images again. Prior problem seems to have been stale images in local cache on the build server. | |||
* 20:42 bd808: Container image rebuild failed with GPG errors in buster-sssd base image. Will investigate and attempt to restart once resolved in a local dev environment. | |||
* 20:33 bd808: Started process to rebuild all buster and bullseye based container images per https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Building_toolforge_specific_images | |||
=== | === 2023-04-18 === | ||
* | * 16:46 dcaro: force-rebooting tools-sgeweblight-10-25/26/27 as they got stuck stopping the grid_exec process | ||
* 16:35 dcaro: rebooting root@tools-sgeweblight-10-27 due to stuck exec daemon not releasing port 6445 | |||
* 16:35 dcaro: rebooting root@tools-sgeweblight-10-25 due to stuck exec daemon not releasing port 6445 | |||
* 16:32 dcaro: rebooting root@tools-sgeweblight-10-26 due to stuck exec daemon not releasing port 6445 | |||
* 16:26 dcaro: rebooting root@tools-sgeexec-10-14 due to stuck exec daemon not releasing port 6445 | |||
=== | === 2023-04-17 === | ||
* | * 13:10 dcaro: rebooting tools-sgegrid-master node ([[phab:T334847|T334847]]) | ||
* 02:43 legoktm: manual restart of apache2 on toolserver-proxy-1 to completely pick up renewed TLS cert (alert was flapping) | |||
* | |||
=== | === 2023-04-11 === | ||
* | * 16:11 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|b65439b}}) - cookbook ran by arturo@nostromo | ||
* 15:46 arturo: upload toolforge-jobs-framework-cli v11 to aptly | |||
* | * 14:17 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller.git ({{Gerrit|d878e49}}) ([[phab:T324834|T324834]]) - cookbook ran by dcaro@vulcanus | ||
* | * 13:19 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:c6c693c from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c6c693c}}) - cookbook ran by arturo@nostromo | ||
* 12:09 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:40bd3b3 from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|40bd3b3}}) - cookbook ran by dcaro@vulcanus | |||
* 12: | * 10:34 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|9aed7e5}}) - cookbook ran by taavi@runko | ||
* 09:15 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/calico ({{Gerrit|c6a3e29}}) ([[phab:T329677|T329677]]) - cookbook ran by taavi@runko | |||
* | * 08:45 wm-bot2: Adding a new k8s worker node - cookbook ran by taavi@runko | ||
* | |||
* | |||
=== | === 2023-04-10 === | ||
* | * 10:46 taavi: patch existing PSP roles to use policy/v1beta1 [[phab:T331619|T331619]] | ||
* | * 09:16 arturo: upgrading k8s cluster to 1.22 ([[phab:T286856|T286856]]) | ||
=== | === 2023-04-07 === | ||
* | * 14:34 wm-bot2: drained, depooled and removed k8s control node tools-k8s-control-3 ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | ||
* 14:30 wm-bot2: removed instance tools-k8s-control-2 - cookbook ran by taavi@runko | |||
* | |||
=== | === 2023-04-05 === | ||
* | * 15:16 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|5ea5992}}) - cookbook ran by taavi@runko | ||
* | * 15:10 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:3569803 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|3569803}}) - cookbook ran by taavi@runko | ||
* 14:56 wm-bot2: Added a new k8s worker tools-k8s-worker-88.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:42 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:42 wm-bot2: Added a new k8s worker tools-k8s-worker-87.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:28 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:28 wm-bot2: Added a new k8s worker tools-k8s-worker-86.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:15 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:15 wm-bot2: Added a new k8s worker tools-k8s-worker-85.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:01 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 14:01 wm-bot2: Added a new k8s worker tools-k8s-worker-84.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:47 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:47 wm-bot2: Added a new k8s worker tools-k8s-worker-83.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:34 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:33 wm-bot2: removed instance tools-k8s-worker-83 - cookbook ran by taavi@runko | |||
* 13:15 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:06 wm-bot2: removing grid node tools-sgeweblight-10-31.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:02 wm-bot2: removing grid node tools-sgeweblight-10-29.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 13:00 wm-bot2: removing grid node tools-sgeexec-10-9.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 12:58 wm-bot2: removing grid node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 12:54 wm-bot2: removing grid node tools-sgeexec-10-7.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 12:52 wm-bot2: removing grid node tools-sgeweblight-10-13.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko | |||
* 12:34 wm-bot2: drained, depooled and removed k8s control node tools-k8s-control-1 - cookbook ran by taavi@runko | |||
* 12:07 wm-bot2: Added a new k8s control tools-k8s-control-6.tools.eqiad1.wikimedia.cloud to the cluster - cookbook ran by taavi@runko | |||
* 11:53 wm-bot2: Adding a new k8s control node - cookbook ran by taavi@runko | |||
* 11:51 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko | |||
* 11:39 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
* 11:38 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko | |||
* 11:21 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
* 11:21 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko | |||
* 11:09 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
* 10:53 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko | |||
* 10:41 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
* 10:41 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko | |||
* 10:16 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
=== | === 2023-04-04 === | ||
* | * 19:00 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | ||
* | * 18:59 wm-bot2: removed instance tools-k8s-control-5 - cookbook ran by taavi@runko | ||
* | * 18:46 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | ||
* 18:45 wm-bot2: Adding a new k8s CONTROL node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko | |||
* 10:15 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | |||
* 09:28 arturo: hard-reboot the 3 k8s control nodes | |||
=== | === 2023-04-03 === | ||
* 23:13 | * 17:13 wm-bot2: rebooted k8s node tools-k8s-worker-31 - cookbook ran by taavi@runko | ||
* 17:11 wm-bot2: rebooted k8s node tools-k8s-worker-32 - cookbook ran by taavi@runko | |||
* 17:09 wm-bot2: rebooted k8s node tools-k8s-worker-33 - cookbook ran by taavi@runko | |||
* 17:07 wm-bot2: rebooted k8s node tools-k8s-worker-34 - cookbook ran by taavi@runko | |||
* 17:05 wm-bot2: rebooted k8s node tools-k8s-worker-35 - cookbook ran by taavi@runko | |||
* 17:04 wm-bot2: rebooted k8s node tools-k8s-worker-36 - cookbook ran by taavi@runko | |||
* 17:02 wm-bot2: rebooted k8s node tools-k8s-worker-37 - cookbook ran by taavi@runko | |||
* 17:00 wm-bot2: rebooted k8s node tools-k8s-worker-38 - cookbook ran by taavi@runko | |||
* 16:58 wm-bot2: rebooted k8s node tools-k8s-worker-39 - cookbook ran by taavi@runko | |||
* 16:56 wm-bot2: rebooted k8s node tools-k8s-worker-40 - cookbook ran by taavi@runko | |||
* 16:55 wm-bot2: rebooted k8s node tools-k8s-worker-41 - cookbook ran by taavi@runko | |||
* 16:53 wm-bot2: rebooted k8s node tools-k8s-worker-42 - cookbook ran by taavi@runko | |||
* 16:51 wm-bot2: rebooted k8s node tools-k8s-worker-43 - cookbook ran by taavi@runko | |||
* 16:49 wm-bot2: rebooted k8s node tools-k8s-worker-44 - cookbook ran by taavi@runko | |||
* 16:45 wm-bot2: rebooted k8s node tools-k8s-worker-45 - cookbook ran by taavi@runko | |||
* 16:43 wm-bot2: rebooted k8s node tools-k8s-worker-46 - cookbook ran by taavi@runko | |||
* 16:41 wm-bot2: rebooted k8s node tools-k8s-worker-47 - cookbook ran by taavi@runko | |||
* 16:40 wm-bot2: rebooted k8s node tools-k8s-worker-48 - cookbook ran by taavi@runko | |||
* 16:38 wm-bot2: rebooted k8s node tools-k8s-worker-49 - cookbook ran by taavi@runko | |||
* 16:36 wm-bot2: rebooted k8s node tools-k8s-worker-50 - cookbook ran by taavi@runko | |||
* 16:35 wm-bot2: rebooted k8s node tools-k8s-worker-51 - cookbook ran by taavi@runko | |||
* 16:33 wm-bot2: rebooted k8s node tools-k8s-worker-52 - cookbook ran by taavi@runko | |||
* 16:31 wm-bot2: rebooted k8s node tools-k8s-worker-53 - cookbook ran by taavi@runko | |||
* 16:28 wm-bot2: rebooted k8s node tools-k8s-worker-54 - cookbook ran by taavi@runko | |||
* 16:27 wm-bot2: rebooted k8s node tools-k8s-worker-55 - cookbook ran by taavi@runko | |||
* 16:25 wm-bot2: rebooted k8s node tools-k8s-worker-56 - cookbook ran by taavi@runko | |||
* 16:23 wm-bot2: rebooted k8s node tools-k8s-worker-57 - cookbook ran by taavi@runko | |||
* 16:21 wm-bot2: rebooted k8s node tools-k8s-worker-58 - cookbook ran by taavi@runko | |||
* 16:20 wm-bot2: rebooted k8s node tools-k8s-worker-59 - cookbook ran by taavi@runko | |||
* 16:18 wm-bot2: rebooted k8s node tools-k8s-worker-60 - cookbook ran by taavi@runko | |||
* 16:09 wm-bot2: rebooted k8s node tools-k8s-worker-61 - cookbook ran by taavi@runko | |||
* 16:07 wm-bot2: rebooted k8s node tools-k8s-worker-62 - cookbook ran by taavi@runko | |||
* 16:01 wm-bot2: rebooted k8s node tools-k8s-worker-64 - cookbook ran by taavi@runko | |||
* 16:00 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko | |||
* 15:58 wm-bot2: rebooted k8s node tools-k8s-worker-65 - cookbook ran by taavi@runko | |||
* 15:56 wm-bot2: rebooted k8s node tools-k8s-worker-66 - cookbook ran by taavi@runko | |||
* 15:48 wm-bot2: rebooted k8s node tools-k8s-worker-67 - cookbook ran by taavi@runko | |||
* 15:38 wm-bot2: rebooted k8s node tools-k8s-worker-68 - cookbook ran by taavi@runko | |||
* 15:36 wm-bot2: rebooted k8s node tools-k8s-worker-69 - cookbook ran by taavi@runko | |||
* 15:34 wm-bot2: rebooted k8s node tools-k8s-worker-70 - cookbook ran by taavi@runko | |||
* 15:32 wm-bot2: rebooted k8s node tools-k8s-worker-71 - cookbook ran by taavi@runko | |||
* 15:30 wm-bot2: rebooted k8s node tools-k8s-worker-72 - cookbook ran by taavi@runko | |||
* 15:28 wm-bot2: rebooted k8s node tools-k8s-worker-73 - cookbook ran by taavi@runko | |||
* 15:26 wm-bot2: rebooted k8s node tools-k8s-worker-74 - cookbook ran by taavi@runko | |||
* 15:24 wm-bot2: rebooted k8s node tools-k8s-worker-75 - cookbook ran by taavi@runko | |||
* 15:22 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko | |||
* 15:17 wm-bot2: rebooted k8s node tools-k8s-worker-75 - cookbook ran by taavi@runko | |||
* 15:14 wm-bot2: rebooted k8s node tools-k8s-worker-76 - cookbook ran by taavi@runko | |||
* 15:12 wm-bot2: rebooted k8s node tools-k8s-worker-77 - cookbook ran by taavi@runko | |||
* 15:10 wm-bot2: rebooted k8s node tools-k8s-worker-78 - cookbook ran by taavi@runko | |||
* 15:08 wm-bot2: rebooted k8s node tools-k8s-worker-79 - cookbook ran by taavi@runko | |||
* 15:06 wm-bot2: rebooted k8s node tools-k8s-worker-80 - cookbook ran by taavi@runko | |||
* 14:59 wm-bot2: rebooted k8s node tools-k8s-worker-81 - cookbook ran by taavi@runko | |||
* 14:41 wm-bot2: rebooted k8s node tools-k8s-worker-82 - cookbook ran by taavi@runko | |||
* 14:38 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko | |||
* 14:13 andrewbogott: test log to see if stashbot is back working | |||
* 13:19 andrewbogott: forcing puppet run on all toolforge VMs | |||
* 08:28 taavi: stop exim4.service on tools-sgecron-2 [[phab:T333477|T333477]] | |||
* 06:52 taavi: stop jobs-framework-emailer to prevent spam due to NFS being read-only [[phab:T333477|T333477]] | |||
=== | === 2023-03-29 === | ||
* | * 16:07 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|dc26f52}}) - cookbook ran by raymond@ubuntu | ||
* | * 15:21 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/registry-admission:24115c7 from https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|24115c7}}) - cookbook ran by raymond@ubuntu | ||
=== | === 2023-03-28 === | ||
* | * 19:43 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|e1b9815}}) - cookbook ran by raymond@ubuntu | ||
=== | === 2023-03-27 === | ||
* | * 22:51 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:70d550a from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|70d550a}}) - cookbook ran by raymond@ubuntu | ||
=== | === 2023-03-26 === | ||
* | * 20:28 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko | ||
=== | === 2023-03-24 === | ||
* | * 14:13 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@endurance | ||
=== | === 2023-03-21 === | ||
* | * 08:11 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko | ||
=== | === 2023-03-20 === | ||
* | * 13:39 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko | ||
* | * 10:57 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@endurance | ||
=== | === 2023-03-19 === | ||
* | * 09:32 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko | ||
=== | === 2023-03-17 === | ||
* | * 15:56 andrewbogott: truncating .out, .err, and .log files to 10MB in anticipation of moving the NFS volumes | ||
=== | === 2023-03-13 === | ||
* | * 09:50 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:f90bd8f from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|f90bd8f}}) - cookbook ran by dcaro@vulcanus | ||
=== | === 2023-03-12 === | ||
* | * 13:40 taavi: restart haproxy on tools-k8s-haproxy-3 | ||
=== | === 2023-03-11 === | ||
* | * 18:38 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 18:36 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 18:34 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* 18:31 taavi: reboot misbehaving tools-sgeexec-10-11 | |||
* | |||
=== | === 2023-03-10 === | ||
* | * 16:36 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|8b42b15}}) - cookbook ran by taavi@runko | ||
=== | === 2023-03-09 === | ||
* | * 10:13 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|53e7f81}}) - cookbook ran by taavi@runko | ||
* 10:04 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:834807c from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|834807c}}) - cookbook ran by taavi@runko | |||
=== | === 2023-03-08 === | ||
* | * 22:31 bd808: Live hacked user-maintainer clusterrole to work around breakage in [[phab:T331572|T331572]] | ||
=== | === 2023-03-07 === | ||
* | * 11:34 wm-bot2: Increased quotas by 2 volumes - cookbook ran by fran@wmf3169 | ||
* 11:09 wm-bot2: Increased quotas by 6 snapshots - cookbook ran by fran@wmf3169 | |||
* 11:07 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 | |||
=== | === 2023-03-06 === | ||
* | * 12:51 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|6688477}}) - cookbook ran by taavi@runko | ||
* 12:33 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/registry-admission:e916fee from https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|e916fee}}) - cookbook ran by taavi@runko | |||
* 12:16 arturo: delete calico deployment, redeploy from https://gitlab.wikimedia.org/repos/cloud/toolforge/calico ([[phab:T328539|T328539]]) | |||
=== | === 2023-03-05 === | ||
* | * 15:43 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|3e04025}}) - cookbook ran by taavi@runko | ||
=== | === 2023-03-02 === | ||
* | * 11:32 arturo: aborrero@tools-k8s-control-2:~$ sudo -i kubectl apply -f /etc/kubernetes/toolforge-tool-roles.yaml (https://gerrit.wikimedia.org/r/c/operations/puppet/+/889836) | ||
=== | === 2023-03-01 === | ||
* | * 13:18 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|13eda9d}}) - cookbook ran by taavi@runko | ||
=== | === 2023-02-28 === | ||
* | * 17:19 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|9252af7}}) - cookbook ran by taavi@runko | ||
* 17:04 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e46da83}}) - cookbook ran by taavi@runko | |||
=== | === 2023-02-23 === | ||
* 18: | * 18:07 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|efb60b3}}) - cookbook ran by taavi@runko | ||
* 09:33 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/buildpack-admission:b34e2f8 from https://github.com/toolforge/buildpack-admission-controller.git ({{Gerrit|b34e2f8}}) - cookbook ran by taavi@runko | |||
=== | === 2023-02-21 === | ||
* | * 09:37 arturo: hard-reboot tools-sgeexec-10-11 (unresponsive to ssh) | ||
=== | === 2023-02-20 === | ||
* | * 11:24 taavi: redeploy volume-admission with helm and cert-manager certificates [[phab:T329530|T329530]] [[phab:T292238|T292238]] | ||
* 11:15 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:7fd13ac from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|ede8bd0}}) - cookbook ran by taavi@runko | |||
* 11:05 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-volume-admission-controller:7fd13ac from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|7fd13ac}}) - cookbook ran by taavi@runko | |||
* 10:39 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 | |||
* 09:20 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | |||
=== | === 2023-02-19 === | ||
* | * 09:16 taavi: uncordon tools-k8s-worker-[80-82] after fixing security groups [[phab:T329378|T329378]] | ||
=== | === 2023-02-17 === | ||
* | * 11:32 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|eeeea4c}}) - cookbook ran by arturo@endurance | ||
* 11:31 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|7729b18}}) ([[phab:T254636|T254636]]) - cookbook ran by arturo@endurance | |||
* 11:26 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8a9b97e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|eeeea4c}}) - cookbook ran by arturo@endurance | |||
* 11:24 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8a9b97e from https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-framework-api ({{Gerrit|618ab29}}) - cookbook ran by arturo@endurance | |||
* 10:25 arturo: build and push mariadb-sssd/base docker image for Toolforge ([[phab:T320178|T320178]], [[phab:T254636|T254636]]) | |||
=== | === 2023-02-16 === | ||
* | * 15:58 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 | ||
* 15:30 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/cert-manager ({{Gerrit|d71994e}}) - cookbook ran by arturo@nostromo | |||
* 13:52 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|7191997}}) - cookbook ran by taavi@runko | |||
* 13:44 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/ingress-admission:1fe8ec4 from https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|1fe8ec4}}) - cookbook ran by taavi@runko | |||
* 12:47 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/ingress-admission:e9b9920 from https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|e9b9920}}) - cookbook ran by taavi@runko | |||
* | * 10:35 arturo: aborrero@tools-k8s-control-1:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml | ||
* 09:48 arturo: grid engine was failed over to shadow server, manually put it back into normal https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Grid#GridEngine_Master | |||
* 09:39 arturo: aborrero@tools-sgegrid-shadow:~$ sudo truncate -s 1G /var/log/syslog (was 17G, full root disk) | |||
* 13: | |||
* 13: | |||
* 12: | |||
* | |||
* | |||
* | |||
=== | === 2023-02-15 === | ||
* 18: | * 18:03 taavi: deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/889585/ to increase amount of haproxy max connections | ||
* 15:19 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | |||
* 09:50 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/cert-manager.git ({{Gerrit|e3f3ce1}}) ([[phab:T329453|T329453]]) - cookbook ran by taavi@runko | |||
* 09:30 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | |||
* | |||
* | |||
* | |||
=== | === 2023-02-14 === | ||
* | * 15:07 taavi: import cert-manager components to local docker registry [[phab:T329453|T329453]] | ||
* | * 12:12 arturo: the fixed webservicemonitor is starting a bunch of grid webservices ([[phab:T329611|T329611]]) | ||
* | * 12:10 arturo: included tools-manifests 0.25 in tools-buster aptly repo, deploying it now! ([[phab:T329611|T329611]], [[phab:T329467|T329467]], [[phab:T244809|T244809]]) | ||
=== | === 2023-02-13 === | ||
* | * 16:05 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 | ||
* 16:03 taavi: update maintain-kubeusers deployment to use helm | |||
* 15:05 taavi: deploy jobs-api updates, improving some status messages | |||
* 15:04 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|13d87c4}}) - cookbook ran by taavi@runko | |||
* 15:00 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:390ed64 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|390ed64}}) - cookbook ran by taavi@runko | |||
* 13:14 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:aac195b from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|aac195b}}) - cookbook ran by taavi@runko | |||
=== | === 2023-02-10 === | ||
* | * 15:45 taavi: reboot tools-k8s-worker-82 to troubleshoot network issues | ||
* 12:44 wm-bot2: Added a new k8s worker tools-k8s-worker-82.tools.eqiad1.wikimedia.cloud to the worker pool ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 12:31 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 12:29 wm-bot2: Added a new k8s worker tools-k8s-worker-81.tools.eqiad1.wikimedia.cloud to the worker pool ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 12:15 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:53 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:44 wm-bot2: removing grid node tools-sgeweblight-10-23.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:42 wm-bot2: removing grid node tools-sgeexec-10-5.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:39 wm-bot2: removing grid node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:26 wm-bot2: removing grid node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
* 11:24 wm-bot2: removing grid node tools-sgeexec-10-1.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko | |||
=== | === 2023-02-01 === | ||
* | * 16:03 taavi: deployed tools-webservice 0.89 | ||
* | * 15:43 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|372037f}}) - cookbook ran by taavi@runko | ||
=== | === 2023-01-26 === | ||
* | * 15:05 taavi: drain and reboot tools-k8s-worker-74 which seems to have some issues with nfs | ||
* | * 14:37 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|307f302}}) - cookbook ran by taavi@runko | ||
* 14:30 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:05966c6 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|05966c6}}) - cookbook ran by taavi@runko | |||
=== | === 2023-01-24 === | ||
* | * 12:04 taavi: deploying toolforge-jobs-framework-cli v10 [[phab:T327775|T327775]] | ||
* 10:07 taavi: publish toolforge-jobs-framework-cli v9 | |||
* | |||
=== | === 2023-01-23 === | ||
* | * 11:31 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d5ae229}}) - cookbook ran by taavi@runko | ||
* 11:23 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:d085c50 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d085c50}}) - cookbook ran by taavi@runko | |||
* 11:17 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|864171a}}) - cookbook ran by taavi@runko | |||
=== | === 2023-01-20 === | ||
* | * 23:24 andrewbogott: truncating logfiles with find . -name '*.err' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; | ||
* | * 21:24 andrewbogott: truncating logfiles with find . -name '*.out' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; | ||
* 01:06 andrewbogott: truncating logfiles with find . -name '*.log' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; | |||
=== | === 2023-01-19 === | ||
* | * 11:46 arturo: `aborrero@tools-k8s-control-1:~$ sudo -i kubectl delete clusterrolebinding jobs-api-psp` (cleanup unused stuff) | ||
=== | === 2023-01-18 === | ||
* | * 15:42 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0ad4c66}}) - cookbook ran by arturo@nostromo | ||
* | * 15:29 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:54cc15e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|54cc15e}}) - cookbook ran by arturo@nostromo | ||
=== | === 2023-01-17 === | ||
* | * 13:55 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8cf38a1}}) - cookbook ran by arturo@endurance | ||
* 13:51 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0d0a882}}) - cookbook ran by arturo@endurance | |||
* 13:34 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:3a58c1d from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|3a58c1d}}) - cookbook ran by arturo@endurance | |||
=== | === 2023-01-10 === | ||
* | * 11:55 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8e0a2f9}}) - cookbook ran by arturo@endurance | ||
* 11:52 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:9514b00 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8e0a2f9}}) - cookbook ran by arturo@endurance | |||
* 11:36 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0243967}}) - cookbook ran by arturo@endurance | |||
=== | === 2023-01-03 === | ||
* | * 17:17 andrewbogott: find -name '*.log' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; | ||
* | |||
=== | === 2022-12-20 === | ||
* | * 09:07 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | ||
=== | === 2022-12-12 === | ||
* | * 14:36 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | ||
=== | === 2022-12-09 === | ||
* | * 07:20 taavi: change the canonical tools-mail external hostname to use mail.tools.wmcloud.org and add valid spf to toolforge.org [[phab:T324809|T324809]] | ||
=== | === 2022-12-05 === | ||
* | * 11:06 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | ||
=== | === 2022-11-30 === | ||
* | * 10:39 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|bc3529d}}) - cookbook ran by arturo@nostromo | ||
* 17: | * 10:17 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:c360d54 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c360d54}}) - cookbook ran by arturo@nostromo | ||
=== | === 2022-11-29 === | ||
* | * 19:52 taavi: clear puppet failure emails from exim queues | ||
=== | === 2022-11-09 === | ||
* | * 08:58 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo | ||
=== | === 2022-11-05 === | ||
* | * 19:28 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.err' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; | ||
* 13:26 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.log' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; | |||
=== | === 2022-11-04 === | ||
* | * 20:41 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.err' -not -newermt "Nov 1, 2021" -exec rm <nowiki>{</nowiki><nowiki>}</nowiki> \; | ||
* 14:02 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.log' -not -newermt "Nov 1, 2021" -exec rm <nowiki>{</nowiki><nowiki>}</nowiki> \; | |||
* 12:20 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d464be4}}) ([[phab:T304900|T304900]]) - cookbook ran by arturo@nostromo | |||
* | * 12:12 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:2b800f5 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|2b800f5}}) ([[phab:T304900|T304900]]) - cookbook ran by arturo@nostromo | ||
* | |||
* | |||
* | |||
* | |||
=== | === 2022-11-01 === | ||
* | * 09:37 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master ([[phab:T322110|T322110]]) - cookbook ran by dcaro@vulcanus | ||
=== | === 2022-10-26 === | ||
* | * 08:45 dcaro: depooling and rebooting tools-sgeexec-10-22 to get nfs scratch working again | ||
=== | === 2022-10-25 === | ||
* | * 16:14 wm-bot2: Increased quotas by 5120 gigabytes - cookbook ran by fran@wmf3169 | ||
* 15: | * 15:26 dcaro: pushed a newer docker-registry.tools.wmflabs.org/python:3.9-slim-bullseye (from upstream pthyon:3.9-slim-bullseye) | ||
=== | === 2022-10-20 === | ||
* | * 16:54 andrewbogott: rebooting tools-package-builder-04 | ||
* 16:49 andrewbogott: rebooting redis nodes (one at a time) | |||
* | * 10:54 taavi: rebuild mono68-sssd image with the expired DST Root CA X3 removed [[phab:T311466|T311466]] | ||
* 10: | |||
=== | === 2022-10-18 === | ||
* | * 11:52 taavi: deploy toolforge-jobs-framework-cli deb v8 | ||
* | * 10:30 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|64385e9}}) ([[phab:T320405|T320405]]) - cookbook ran by arturo@nostromo | ||
* 10:27 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:9be2272 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|9be2272}}) - cookbook ran by taavi@runko | |||
* 10:18 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|64385e9}}) ([[phab:T320405|T320405]]) - cookbook ran by arturo@nostromo | |||
=== | === 2022-10-17 === | ||
* | * 07:25 taavi: push updated perl532 images [[phab:T320824|T320824]] | ||
=== | === 2022-10-14 === | ||
* | * 07:54 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0cc020e}}) ([[phab:T311466|T311466]]) - cookbook ran by taavi@runko | ||
=== | === 2022-10-13 === | ||
* | * 15:10 arturo: restart jobs-emailer pod | ||
=== | === 2022-10-12 === | ||
* | * 23:25 bd808: Rebuilding all Toolforge docker images ([[phab:T278436|T278436]], [[phab:T311466|T311466]], [[phab:T293552|T293552]]) | ||
* 16: | * 20:43 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages. Third try seems to be working. ([[phab:T316554|T316554]]) | ||
* 20:31 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages after fixing bug in building the bullseye base image. ([[phab:T316554|T316554]]) | |||
* 16:26 dcaro: deploy the latest registry admission webhook, now for real (image tag {{Gerrit|07bc7db}}) | |||
* 12:48 dcaro: deploy the latest registry admission webhook (image tag {{Gerrit|07bc7db}}) | |||
* 09:26 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | |||
* 09:19 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | |||
=== | === 2022-10-11 === | ||
* | * 13:52 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8574c36 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8574c36}}) - cookbook ran by taavi@runko | ||
=== | === 2022-10-10 === | ||
* 19: | * 19:30 taavi: rebooting all k8s worker nodes to clean up labstore1006/7 remains | ||
* 16:51 taavi: clean up labstore1006/7 mounts from k8s control nodes [[phab:T320425|T320425]] | |||
* 11:35 arturo: aborrero@tools-k8s-control-1:~$ sudo -i kubectl -n jobs-emailer rollout restart deployment/jobs-emailer ([[phab:T317998|T317998]]) | |||
* 08:44 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|afa90ed}}) ([[phab:T320284|T320284]]) - cookbook ran by taavi@runko | |||
* 08:39 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|afa90ed}}) - cookbook ran by taavi@runko | |||
=== | === 2022-10-09 === | ||
* | * 17:29 taavi: kill 10 idle tmux sessions of user 'hoi' on tools-sgebastion-10 [[phab:T320352|T320352]] | ||
=== | === 2022-10-07 === | ||
* | * 13:02 taavi: taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # [[phab:T320240|T320240]] | ||
=== | === 2022-10-06 === | ||
* | * 00:39 bd808: Image rebuild failing with debian apt repo signature issue. Will investigate tomorrow. ([[phab:T316554|T316554]]) | ||
* 00:36 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages. ([[phab:T316554|T316554]]) | |||
* 00:04 bd808: Building new php74-sssd-base & web images ([[phab:T310435|T310435]]) | |||
=== | === 2022-10-03 === | ||
* | * 14:36 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|8da432b}}) - cookbook ran by taavi@runko | ||
=== | === 2022-09-28 === | ||
* | * 21:23 lucaswerkmeister: on tools-sgebastion-10: run-puppet-agent # [[phab:T318858|T318858]] | ||
* 21: | * 21:22 lucaswerkmeister: on tools-sgebastion-10: apt remove emacs-common emacs-bin-common # fix package conflict, [[phab:T318858|T318858]] | ||
* | * 21:15 lucaswerkmeister: added root SSH key for myself, manually ran puppet on tools-sgebastion-10 to apply it (seemingly successfully) | ||
=== | === 2022-09-22 === | ||
* | * 12:30 taavi: add TheresNoTime to the 'toollabs-trusted' gerrit group [[phab:T317438|T317438]] | ||
* | * 12:27 taavi: add TheresNoTime as a project admin and to the roots sudo policy [[phab:T317438|T317438]] | ||
=== | === 2022-09-10 === | ||
* | * 07:39 wm-bot2: removing instance tools-prometheus-03 - cookbook ran by taavi@runko | ||
=== | === 2022-09-07 === | ||
* | * 10:22 dcaro: Pushing the new toolforge builder image based on the new 0.8 buildpacks ([[phab:T316854|T316854]]) | ||
=== | === 2022-09-06 === | ||
* | * 08:06 dcaro_away: Published new toolforge-bullseye0-run and toolforge-bullseye0-build images for the toolforge buildpack builder ([[phab:T316854|T316854]]) | ||
=== | === 2022-08-25 === | ||
* | * 10:40 taavi: tagged new version of the python39-web container with a shell implementation of webservice-runner [[phab:T293552|T293552]] | ||
=== | === 2022-08-24 === | ||
* | * 12:20 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|eba66bc}}) - cookbook ran by taavi@runko | ||
* | * 12:20 taavi: upgrading ingress-nginx to v1.3 | ||
=== | === 2022-08-20 === | ||
* | * 07:44 dcaro_away: all k8s nodes ready now \o/ ([[phab:T315718|T315718]]) | ||
* | * 07:43 dcaro_away: rebooted tools-k8s-control-2, seemed stuck trying to wait for tools home (nfs?), after reboot came back up ([[phab:T315718|T315718]]) | ||
* | * 07:41 dcaro_away: cloudvirt1023 down took out 3 workers, 1 control, and a grid exec and a weblight, they are taking long to restart, looking ([[phab:T315718|T315718]]) | ||
=== | === 2022-08-18 === | ||
* | * 14:45 andrewbogott: adding lucaswerkmeister as projectadmin ([[phab:T314527|T314527]]) | ||
* | * 14:43 andrewbogott: removing some inactive projectadmins: rush, petrb, mdipietro, jeh, krenair | ||
=== | === 2022-08-17 === | ||
* | * 16:34 taavi: kubectl sudo delete cm -n tool-wdml maintain-kubeusers # [[phab:T315459|T315459]] | ||
* 08:30 taavi: failing the grid from the shadow back to the master, some disruption expected | |||
=== | === 2022-08-16 === | ||
* | * 17:28 taavi: fail over docker-registry, tools-docker-registry-06->docker-registry-05 | ||
=== | === 2022-08-11 === | ||
* | * 16:57 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko | ||
* | * 16:55 taavi: restart puppetdb on tools-puppetdb-1, crashed during the ceph issues | ||
=== | === 2022-08-05 === | ||
* | * 15:08 wm-bot2: removing grid node tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 15:05 wm-bot2: removing grid node tools-sgeexec-10-12.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 15:00 wm-bot2: created node tools-sgewebgen-10-3.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | ||
=== | === 2022-08-03 === | ||
* | * 15:51 dhinus: recreated jobs-api pods to pick up new ConfigMap | ||
* | * 15:02 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c47ac41}}) - cookbook ran by fran@MacBook-Pro.station | ||
=== | === 2022-07-20 === | ||
* | * 19:31 taavi: reboot toolserver-proxy-01 to free up disk space probably held by stale file handles | ||
* | * 08:06 wm-bot2: removing grid node tools-sgeexec-10-6.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
=== | === 2022-07-19 === | ||
* 10: | * 17:53 wm-bot2: created node tools-sgeexec-10-21.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | ||
* 17:00 wm-bot2: removing grid node tools-sgeexec-10-3.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 16:58 wm-bot2: removing grid node tools-sgeexec-10-4.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 16:24 wm-bot2: created node tools-sgeexec-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | |||
* 15:59 taavi: tag current maintain-kubernetes :beta image as: :latest | |||
=== | === 2022-07-17 === | ||
* | * 15:52 wm-bot2: removing grid node tools-sgeexec-10-10.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 15:43 wm-bot2: removing grid node tools-sgeexec-10-2.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* | * 13:26 wm-bot2: created node tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | ||
=== | === 2022-07-14 === | ||
* | * 13:48 taavi: rebooting tools-sgeexec-10-2 | ||
=== | === 2022-07-13 === | ||
* | * 12:09 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | ||
=== | === 2022-07-11 === | ||
* | * 16:06 wm-bot2: Increased quotas by <nowiki>{</nowiki>self.increases<nowiki>}</nowiki> ([[phab:T312692|T312692]]) - cookbook ran by nskaggs@x1carbon | ||
=== | === 2022-07-07 === | ||
* | * 07:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus | ||
=== | === 2022-06-28 === | ||
* | * 17:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master ([[phab:T311538|T311538]]) - cookbook ran by dcaro@vulcanus | ||
* | * 15:51 taavi: add 4096G cinder quota [[phab:T311509|T311509]] | ||
=== | === 2022-06-27 === | ||
* | * 18:14 taavi: restart calico, appears to have got stuck after the ca replacement operation | ||
* | * 18:02 taavi: switchover active cron server to tools-sgecron-2 [[phab:T284767|T284767]] | ||
* | * 17:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0915.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* 17:52 wm-bot2: removing grid node tools-sgewebgrid-generic-0902.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:49 wm-bot2: removing grid node tools-sgeexec-0942.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:15 taavi: [[phab:T311412|T311412]] updating ca used by k8s-apiserver->etcd communication, breakage may happen | |||
* 14:58 taavi: renew puppet ca cert and certificate for tools-puppetmaster-02 [[phab:T311412|T311412]] | |||
* 14:50 taavi: backup /var/lib/puppet/server to /root/puppet-ca-backup-2022-06-27.tar.gz on tools-puppetmaster-02 before we do anything else to it [[phab:T311412|T311412]] | |||
=== | === 2022-06-23 === | ||
* 19: | * 17:51 wm-bot2: removing grid node tools-sgeexec-0941.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* 17:49 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0916.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:46 wm-bot2: removing grid node tools-sgewebgrid-generic-0901.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:32 wm-bot2: removing grid node tools-sgeexec-0939.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:30 wm-bot2: removing grid node tools-sgeexec-0938.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:27 wm-bot2: removing grid node tools-sgeexec-0937.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:22 wm-bot2: removing grid node tools-sgeexec-0936.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:19 wm-bot2: removing grid node tools-sgeexec-0935.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:17 wm-bot2: removing grid node tools-sgeexec-0934.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:14 wm-bot2: removing grid node tools-sgeexec-0933.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:11 wm-bot2: removing grid node tools-sgeexec-0932.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 17:09 wm-bot2: removing grid node tools-sgeexec-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15:30 wm-bot2: removing grid node tools-sgeexec-0947.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 13:59 taavi: removing remaining continuous jobs from the stretch grid [[phab:T277653|T277653]] | |||
=== | === 2022-06-22 === | ||
* | * 15:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0917.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* 15:51 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0918.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15:47 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0919.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15:45 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15: | |||
* 15: | |||
* 15: | |||
=== | === 2022-06-21 === | ||
* | * 15:23 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | ||
* 15:20 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15:18 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0913.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
* 15:07 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0912.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko | |||
=== | === 2022-06-03 === | ||
* 18: | * 20:07 wm-bot2: created node tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | ||
* | * 19:51 balloons: Scaling webservice nodes to 20, using new 8G swap flavor [[phab:T309821|T309821]] | ||
* 19:35 wm-bot2: created node tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 19:03 wm-bot2: created node tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 19:01 wm-bot2: created node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 19:00 balloons: depooled old nodes, bringing entirely new grid of nodes online [[phab:T309821|T309821]] | |||
* 18:22 wm-bot2: created node tools-sgeweblight-10-17.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 17:54 wm-bot2: created node tools-sgeweblight-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 17:52 wm-bot2: created node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 16:59 andrewbogott: building a bunch of new lighttpd nodes (beginning with tools-sgeweblight-10-12) using a flavor with more swap space | |||
* 16:56 wm-bot2: created node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster | |||
* 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor [[phab:T309821|T309821]] | |||
* 15:50 balloons: temp add 1.0G swap to sgeweblight hosts [[phab:T309821|T309821]] | |||
* 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor t309821 | |||
* 15:49 balloons: temp add 1.0G swap to sgeweblight hosts t309821 | |||
* 13:25 bd808: Upgrading fleet to tools-webservice 0.86 ([[phab:T309821|T309821]]) | |||
* 13:20 bd808: publish tools-webservice 0.86 ([[phab:T309821|T309821]]) | |||
* 12:46 taavi: start webservicemonitor on tools-sgecron-01 [[phab:T309821|T309821]] | |||
* 10:36 taavi: draining each sgeweblight node one by one, and removing the jobs stuck in 'deleting' too | |||
* 05:05 taavi: removing duplicate (there should be only one per tool) web service jobs from the grid [[phab:T309821|T309821]] | |||
* 04:52 taavi: revert bd808's changes to profile::toolforge::active_proxy_host | |||
* 03:21 bd808: Cleared queue error states after deploying new toolforge-webservice package ([[phab:T309821|T309821]]) | |||
* 03:10 bd808: publish tools-webservice 0.85 with hack for [[phab:T309821|T309821]] | |||
=== | === 2022-06-02 === | ||
* | * 22:26 bd808: Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler | ||
* | * 21:56 bd808: Removed legacy "active_proxy_host" hiera setting | ||
* 21:55 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key | |||
* 21:41 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key | |||
* 21:23 wm-bot2: created node tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | |||
* 12:42 wm-bot2: rebooting stretch exec grid workers - cookbook ran by taavi@runko | |||
* 12:13 wm-bot2: created node tools-sgeweblight-10-7.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | |||
* 12:03 dcaro: refresh prometheus certs ([[phab:T308402|T308402]]) | |||
* 11:47 dcaro: refresh registry-admission-controller certs ([[phab:T308402|T308402]]) | |||
* 11:42 dcaro: refresh ingress-admission-controller certs ([[phab:T308402|T308402]]) | |||
* 11:36 dcaro: refresh volume-admission-controller certs ([[phab:T308402|T308402]]) | |||
* 11:24 wm-bot2: created node tools-sgeweblight-10-6.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | |||
* 11:17 taavi: publish jobutils 1.44 that updates the grid default from stretch to buster [[phab:T277653|T277653]] | |||
* 10:16 taavi: publish tools-webservice 0.84 that updates the grid default from stretch to buster [[phab:T277653|T277653]] | |||
* 09:54 wm-bot2: created node tools-sgeexec-10-14.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko | |||
=== | === 2022-06-01 === | ||
* | * 11:18 taavi: depool and remove tools-sgeexec-09[07-14] | ||
=== | === 2022-05-31 === | ||
* | * 16:51 taavi: delete tools-sgeexec-0904 for [[phab:T309525|T309525]] experimentation | ||
=== | === 2022-05-30 === | ||
* | * 08:24 taavi: depool tools-sgeexec-[0901-0909] (7 nodes total) [[phab:T277653|T277653]] | ||
=== | === 2022-05-26 === | ||
* | * 15:39 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T309146|T309146]]) - cookbook ran by taavi@runko | ||
=== | === 2022-05-22 === | ||
* | * 17:04 taavi: failover tools-redis to the updated cluster [[phab:T278541|T278541]] | ||
* 16:42 wm-bot2: removing grid node tools-sgeexec-0940.tools.eqiad1.wikimedia.cloud ([[phab:T308982|T308982]]) - cookbook ran by taavi@runko | |||
=== | === 2022-05-16 === | ||
* | * 14:02 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|7037eca}}) - cookbook ran by taavi@runko | ||
=== | === 2022-05-14 === | ||
* | * 10:47 taavi: hard reboot unresponsible tools-sgeexec-0940 | ||
=== | === 2022-05-12 === | ||
* | * 12:36 taavi: re-enable CronJobControllerV2 [[phab:T308205|T308205]] | ||
* | * 09:28 taavi: deploy jobs-api update [[phab:T308204|T308204]] | ||
* | * 09:15 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T308204|T308204]]) - cookbook ran by taavi@runko | ||
=== | === 2022-05-10 === | ||
* | * 15:18 taavi: depool tools-k8s-worker-42 for experiments | ||
* 13:54 taavi: enable distro-wikimedia unattended upgrades [[phab:T290494|T290494]] | |||
* | |||
=== | === 2022-05-06 === | ||
* | * 19:46 bd808: Rebuilt toolforge-perl532-sssd-base & toolforge-perl532-sssd-web to add liblocale-codes-perl ([[phab:T307812|T307812]]) | ||
=== | === 2022-05-05 === | ||
* | * 17:28 taavi: deploy tools-webservice 0.83 [[phab:T307693|T307693]] | ||
=== | === 2022-05-03 === | ||
* | * 08:20 taavi: redis: start replication from the old cluster to the new one ([[phab:T278541|T278541]]) | ||
=== | === 2022-05-02 === | ||
* | * 08:54 taavi: restart acme-chief.service [[phab:T307333|T307333]] | ||
=== | === 2022-04-25 === | ||
* | * 14:56 bd808: Rebuilding all docker images to pick up toolforge-webservice v0.82 ([[phab:T214343|T214343]]) | ||
* | * 14:46 bd808: Building toolforge-webservice v0.82 | ||
=== | === 2022-04-23 === | ||
* | * 16:51 bd808: Built new perl532-sssd/<nowiki>{</nowiki>base,web<nowiki>}</nowiki> images and pushed to registry ([[phab:T214343|T214343]]) | ||
=== | === 2022-04-20 === | ||
* | * 16:58 taavi: reboot toolserver-proxy-01 to free up disk space from stale file handles(?) | ||
* | * 07:51 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8f37a04}}) - cookbook ran by taavi@runko | ||
=== | === 2022-04-16 === | ||
* 18: | * 18:53 wm-bot: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/kubernetes-metrics ({{Gerrit|2c485e9}}) - cookbook ran by taavi@runko | ||
=== | === 2022-04-12 === | ||
* 21: | * 21:32 bd808: Added komla to Gerrit group 'toollabs-trusted' ([[phab:T305986|T305986]]) | ||
* | * 21:27 bd808: Added komla to 'roots' sudoers policy ([[phab:T305986|T305986]]) | ||
* | * 21:24 bd808: Add komla as projectadmin ([[phab:T305986|T305986]]) | ||
=== | === 2022-04-10 === | ||
* | * 18:43 taavi: deleted `/tmp/dwl02.out-20210915` on tools-sgebastion-07 (not touched since september, taking up 1.3G of disk space) | ||
=== | === 2022-04-09 === | ||
* | * 15:30 taavi: manually prune user.log on tools-prometheus-03 to free up some space on / | ||
=== | === 2022-04-08 === | ||
* | * 10:44 arturo: disabled debug mode on the k8s jobs-emailer component | ||
=== | === 2022-04-05 === | ||
* | * 07:52 wm-bot: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d7d3463}}) - cookbook ran by arturo@nostromo | ||
* 07:44 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d7d3463}}) - cookbook ran by arturo@nostromo | |||
* 07:21 arturo: deploying toolforge-jobs-framework-cli v7 | |||
=== | === 2022-04-04 === | ||
* | * 17:05 wm-bot: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|cbcfc47}}) - cookbook ran by arturo@nostromo | ||
* 16:56 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|cbcfc47}}) - cookbook ran by arturo@nostromo | |||
* | * 09:28 arturo: deployed toolforge-jobs-framework-cli v6 into aptly and installed it on buster bastions | ||
* | |||
=== | === 2022-03-28 === | ||
* | * 09:32 wm-bot: cleaned up grid queue errors on tools-sgegrid-master.tools.eqiad1.wikimedia.cloud ([[phab:T304816|T304816]]) - cookbook ran by arturo@nostromo | ||
=== | === 2022-03-15 === | ||
* | * 16:57 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|084ee51}}) - cookbook ran by arturo@nostromo | ||
* 11:24 arturo: cleared error state on queue continuous@tools-sgeexec-0939.tools.eqiad.wmflabs (a job took a very long time to be scheduled...) | |||
* | |||
=== | === 2022-03-14 === | ||
* 11:44 arturo: | * 11:44 arturo: deploy jobs-framework-emailer {{Gerrit|9470a5f339fd5a44c97c69ce97239aef30f5ee41}} ([[phab:T286135|T286135]]) | ||
* 10:48 dcaro: pushed v0.33.2 tekton control and webhook images, and bashA5.1.4 to the local repo ([[phab:T297090|T297090]]) | |||
* 10: | |||
=== | === 2022-03-10 === | ||
* | * 09:42 arturo: cleaned grid queue error state @ tools-sgewebgrid-generic-0902 | ||
=== | === 2022-03-01 === | ||
* 13: | * 13:41 dcaro: rebooting tools-sgeexec-0916 to clear any state ([[phab:T302702|T302702]]) | ||
* | * 12:11 dcaro: Cleared error state queues for sgeexec-0916 ([[phab:T302702|T302702]]) | ||
* 10:23 arturo: tools-sgeeex-0913/0916 are depooled, queue errors. Reboot them and clean errors by hand | |||
=== | === 2022-02-28 === | ||
* | * 08:02 taavi: reboot sgeexec-0916 | ||
* 07:49 taavi: depool tools-sgeexec-0916.tools as it is out of disk space on / | |||
* | |||
=== | === 2022-02-17 === | ||
* | * 08:23 taavi: deleted tools-clushmaster-02 | ||
* 08:14 taavi: made tools-puppetmaster-02 its own client to fix `puppet node deactivate` puppetdb access | |||
=== | === 2022-02-16 === | ||
* | * 00:12 bd808: Image builds completed. | ||
=== | === 2022-02-15 === | ||
* | * 23:17 bd808: Image builds failed in buster php image with an apt error. The error looks transient, so starting builds over. | ||
* 18: | * 23:06 bd808: Started full rebuild of Toolforge containers to pick up webservice 0.81 and other package updates in tmux session on tools-docker-imagebuilder-01 | ||
* | * 22:58 bd808: `sudo apt-get update && sudo apt-get install toolforge-webservice` on all bastions to pick up 0.81 | ||
* | * 22:50 bd808: Built new toollabs-webservice 0.81 | ||
* | * 18:43 bd808: Enabled puppet on tools-proxy-05 | ||
* 18:38 bd808: Disabled puppet on tools-proxy-05 for manual testing of nginx config changes | |||
* 18:21 taavi: delete tools-package-builder-03 | |||
* 11:49 arturo: invalidate sssd cache in all bastions to debug [[phab:T301736|T301736]] | |||
* 11:16 arturo: purge debian package `unscd` on tools-sgebastion-10/11 for [[phab:T301736|T301736]] | |||
* 11:15 arturo: reboot tools-sgebastion-10 for [[phab:T301736|T301736]] | |||
=== | === 2022-02-10 === | ||
* | * 15:07 taavi: shutdown tools-clushmaster-02 [[phab:T298191|T298191]] | ||
* 13:25 wm-bot: trying to join node tools-sgewebgen-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 13:24 wm-bot: trying to join node tools-sgewebgen-10-1 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 13:07 wm-bot: trying to join node tools-sgeweblight-10-5 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 13:06 wm-bot: trying to join node tools-sgeweblight-10-4 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 13:05 wm-bot: trying to join node tools-sgeweblight-10-3 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 13:03 wm-bot: trying to join node tools-sgeweblight-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 12:54 wm-bot: trying to join node tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud to the grid cluster in tools. - cookbook ran by arturo@nostromo | |||
* 08:45 taavi: set `profile::base::manage_ssh_keys: true` globally [[phab:T214427|T214427]] | |||
* 08:16 taavi: enable puppetdb and re-enable puppet with puppetdb ssh key management disabled (profile::base::manage_ssh_keys: false) - [[phab:T214427|T214427]] | |||
* 08:06 taavi: disable puppet globally for enabling puppetdb [[phab:T214427|T214427]] | |||
=== | === 2022-02-09 === | ||
* | * 19:29 taavi: installed tools-puppetdb-1, not configured on puppetmaster side yet [[phab:T214427|T214427]] | ||
* | * 18:56 wm-bot: pooled 10 grid nodes tools-sgeweblight-10-[1-5],tools-sgewebgen-10-[1,2],tools-sgeexec-10-[1-10] ([[phab:T277653|T277653]]) - cookbook ran by arturo@nostromo | ||
* 18:30 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo | |||
* 18:25 arturo: ignore last message | |||
* 18:24 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo | |||
* 14:04 taavi: created tools-cumin-1/toolsbeta-cumin-1 [[phab:T298191|T298191]] | |||
=== | === 2022-02-07 === | ||
* | * 17:37 taavi: generated authdns_acmechief ssh key and stored password in a text file in local labs/private repository ([[phab:T288406|T288406]]) | ||
* 12:52 taavi: updated maintain-kubeusers for [[phab:T301081|T301081]] | |||
* | |||
=== | === 2022-02-04 === | ||
* | * 22:33 taavi: `root@tools-sgebastion-10:/data/project/ru_monuments/.kube# mv config old_config` # experimenting with [[phab:T301015|T301015]] | ||
* 21:36 taavi: clear error state from some webgrid nodes | |||
* | |||
=== | === 2022-02-03 === | ||
* | * 09:06 taavi: run `sudo apt-get clean` on login-buster/dev-buster to clean up disk space | ||
* 08:01 taavi: restart acme-chief to force renewal of toolserver.org certificate | |||
* | |||
=== | === 2022-01-30 === | ||
* | * 14:41 taavi: created a neutron port with ip 172.16.2.46 for a service ip for toolforge redis automatic failover [[phab:T278541|T278541]] | ||
* 14:22 taavi: creating a cluster of 3 bullseye redis hosts for [[phab:T278541|T278541]] | |||
=== | === 2022-01-26 === | ||
* | * 18:33 wm-bot: depooled grid node tools-sgeexec-10-10 - cookbook ran by arturo@nostromo | ||
* 18:33 wm-bot: depooled grid node tools-sgeexec-10-9 - cookbook ran by arturo@nostromo | |||
* 18:33 wm-bot: depooled grid node tools-sgeexec-10-8 - cookbook ran by arturo@nostromo | |||
* 18:32 wm-bot: depooled grid node tools-sgeexec-10-7 - cookbook ran by arturo@nostromo | |||
* 18:32 wm-bot: depooled grid node tools-sgeexec-10-6 - cookbook ran by arturo@nostromo | |||
* 18:31 wm-bot: depooled grid node tools-sgeexec-10-5 - cookbook ran by arturo@nostromo | |||
* 18:30 wm-bot: depooled grid node tools-sgeexec-10-4 - cookbook ran by arturo@nostromo | |||
* 18:28 wm-bot: depooled grid node tools-sgeexec-10-3 - cookbook ran by arturo@nostromo | |||
* 18:27 wm-bot: depooled grid node tools-sgeexec-10-2 - cookbook ran by arturo@nostromo | |||
* 18:27 wm-bot: depooled grid node tools-sgeexec-10-1 - cookbook ran by arturo@nostromo | |||
* 13:55 arturo: scaling up the buster web grid with 5 lighttd and 2 generic nodes ([[phab:T277653|T277653]]) | |||
=== | === 2022-01-25 === | ||
* | * 11:50 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo | ||
* 11:44 arturo: rebooting buster exec nodes | |||
* 08:34 taavi: sign puppet certificate for tools-sgeexec-10-4 | |||
=== | === 2022-01-24 === | ||
* | * 17:44 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo | ||
* | * 15:23 arturo: scaling up the grid with 10 buster exec nodes ([[phab:T277653|T277653]]) | ||
=== | === 2022-01-20 === | ||
* | * 17:05 arturo: drop 9 of the 10 buster exec nodes created earlier. They didn't get DNS records | ||
* 12:56 arturo: scaling up the grid with 10 buster exec nodes ([[phab:T277653|T277653]]) | |||
* | |||
=== | === 2022-01-19 === | ||
* | * 17:34 andrewbogott: rebooting tools-sgeexec-0913.tools.eqiad1.wikimedia.cloud to recover from (presumed) fallout from the scratch/nfs move | ||
=== | === 2022-01-14 === | ||
* | * 19:09 taavi: set /var/run/lighttpd as world-writable on all lighttpd webgrid nodes, [[phab:T299243|T299243]] | ||
=== | === 2022-01-12 === | ||
* | * 11:27 arturo: created puppet prefix `tools-sgeweblight`, drop `tools-sgeweblig` | ||
* 11:03 arturo: created puppet prefix 'tools-sgeweblig' | |||
* | * 11:02 arturo: created puppet prefix 'toolsbeta-sgeweblig' | ||
* | |||
=== | === 2022-01-04 === | ||
* 17:18 bd808: tools-acme-chief-01: sudo service acme-chief restart | |||
* 08:12 taavi: disable puppet & exim4 on [[phab:T298501|T298501]] | |||
* | |||
* | |||