You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Nova Resource:Tools/SAL: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)

12 June 2021

  • curprev 14:3914:39, 12 June 2021imported>Stashbot 218,567 bytes +267 majavah: remove nonexistent tools-prometheus-04 and add tools-prometheus-05 to hiera key "prometheus_nodes"

10 June 2021

  • curprev 17:3817:38, 10 June 2021imported>Stashbot 218,300 bytes +104 majavah: clear error state from tools-sgeexec-0907, task@tools-sgeexec-0939

9 June 2021

  • curprev 13:5713:57, 9 June 2021imported>Stashbot 218,196 bytes +135 majavah: clear error state from exec nodes tools-sgeexec-0913, tools-sgeexec-0936, task@tools-sgeexec-0940

7 June 2021

  • curprev 18:3918:39, 7 June 2021imported>Stashbot 218,061 bytes +334 bstorm: cleaning up more error conditions on grid queues

4 June 2021

  • curprev 21:3021:30, 4 June 2021imported>Stashbot 217,727 bytes +193 bstorm: deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221

3 June 2021

  • curprev 18:2718:27, 3 June 2021imported>Stashbot 217,534 bytes +181 majavah: renew prometheus kubernetes certificate T280301

1 June 2021

  • curprev 10:1010:10, 1 June 2021imported>Stashbot 217,353 bytes +238 majavah: properly clean up deleted vms tools-k8s-haproxy-[1,2], tools-checker-03 from puppet after using the wrong fqdn first time

30 May 2021

  • curprev 18:5818:58, 30 May 2021imported>Stashbot 217,115 bytes +75 majavah: clear grid error state from 14 queues

27 May 2021

  • curprev 18:0318:03, 27 May 2021imported>Stashbot 217,040 bytes +283 bstorm: adjusted profile::wmcs::kubeadm::etcd_latency_ms from 30 back to the default (10)

24 May 2021

  • curprev 10:3610:36, 24 May 2021imported>Stashbot 216,757 bytes +230 arturo: rebased labs/private.git after merge conflict

22 May 2021

  • curprev 14:4714:47, 22 May 2021imported>Stashbot 216,527 bytes +389 majavah: manually remove jeh admin certificates and from maintain-kubeusers configmap T282725

21 May 2021

20 May 2021

  • curprev 17:0517:05, 20 May 2021imported>Stashbot 215,512 bytes +488 Majavah: pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221

19 May 2021

16 May 2021

  • curprev 16:5216:52, 16 May 2021imported>Stashbot 214,761 bytes +136 Majavah: clear error state from tools-sgeexec-0905 tools-sgeexec-0907 tools-sgeexec-0936 tools-sgeexec-0941

14 May 2021

  • curprev 19:1819:18, 14 May 2021imported>Stashbot 214,625 bytes +379 bstorm: adjusting the rate limits for bastions nfs_write upward a lot to make NFS writes faster now that the cluster is finally using 10Gb on the backend and frontend T218338

12 May 2021

11 May 2021

  • curprev 17:1717:17, 11 May 2021imported>Stashbot 213,862 bytes +593 Majavah: shutdown and delete tools-checker-03 T278540

10 May 2021

9 May 2021

  • curprev 06:5506:55, 9 May 2021imported>Stashbot 212,514 bytes +79 Majavah: clear error state from tools-sgeexec-0916

8 May 2021

  • curprev 10:5710:57, 8 May 2021imported>Stashbot 212,435 bytes +214 Majavah: import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221

7 May 2021

  • curprev 18:0718:07, 7 May 2021imported>Stashbot 212,221 bytes +665 Majavah: generate and add k8s haproxy keepalived password (profile::toolforge::k8s::haproxy::keepalived_password) to private puppet repo

6 May 2021

  • curprev 14:4314:43, 6 May 2021imported>Stashbot 211,556 bytes +296 Majavah: clear error states from all currently erroring exec nodes

5 May 2021

  • curprev 19:2719:27, 5 May 2021imported>Stashbot 211,260 bytes +120 andrewbogott: adding taavi as a sudo root to project toolforge for T278390

4 May 2021

  • curprev 15:2315:23, 4 May 2021imported>Stashbot 211,140 bytes +151 arturo: upgrading exim4-daemon-heavy in tools-mail-03

3 May 2021

  • curprev 16:2416:24, 3 May 2021imported>Stashbot 210,989 bytes +360 dcaro: started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641)

29 April 2021

  • curprev 18:2318:23, 29 April 2021imported>Stashbot 210,629 bytes +178 bstorm: removing one more etcd node via cookbook T279723

27 April 2021

  • curprev 16:4016:40, 27 April 2021imported>Stashbot 210,451 bytes +170 bstorm: deleted all the errored out grid jobs stuck in queue wait

26 April 2021

  • curprev 12:1712:17, 26 April 2021imported>Stashbot 210,281 bytes +110 arturo: allowing more tools into the legacy redirector (T281003)

22 April 2021

20 April 2021

  • curprev 22:2022:20, 20 April 2021imported>Stashbot 209,964 bytes +818 bd808: `clush -w @all -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"`

19 April 2021

  • curprev 10:5310:53, 19 April 2021imported>Stashbot 209,146 bytes +205 dcaro: reverting setting prometheus data source in grafana to 'server', can't connect,

16 April 2021

  • curprev 23:1523:15, 16 April 2021imported>Stashbot 208,941 bytes +622 bstorm: cleaned up all source files for the grid with the old domain name to enable future node creation T277653

13 April 2021

  • curprev 13:2613:26, 13 April 2021imported>Stashbot 208,319 bytes +513 dcaro: upgrade puppet and python-wmflib on tools-prometheus-03

11 April 2021

  • curprev 16:0716:07, 11 April 2021imported>Stashbot 207,806 bytes +194 bstorm: cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936

8 April 2021

  • curprev 18:2518:25, 8 April 2021imported>Stashbot 207,612 bytes +706 bstorm: cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for tools-sgegrid-master and tools-sgegrid-shadow using the old fqdns T277653

7 April 2021

  • curprev 04:3504:35, 7 April 2021imported>Stashbot 206,906 bytes +182 andrewbogott: replacing the mx record '10 mail.tools.wmcloud.org' with '10 mail.tools.wmcloud.org.' — trying to fix axfr for the tools.wmcloud.org zone

6 April 2021

  • curprev 15:1615:16, 6 April 2021imported>Stashbot 206,724 bytes +1,295 bstorm: cleared queue state since a few had "errored" for failed jobs.

5 April 2021

  • curprev 17:0217:02, 5 April 2021imported>Stashbot 205,429 bytes +205 bstorm: chowned the data volume for the docker registry to docker-registry:docker-registry

1 April 2021

  • curprev 20:4320:43, 1 April 2021imported>Stashbot 205,224 bytes +555 bstorm: cleared error state from the grid queues caused by unspecified job errors

31 March 2021

  • curprev 15:5715:57, 31 March 2021imported>Stashbot 204,669 bytes +891 arturo: rebooting `tools-mail-03` after enabling NFS (T267082, T278538)

30 March 2021

  • curprev 16:1516:15, 30 March 2021imported>Stashbot 203,778 bytes +821 bstorm: added `labstore::traffic_shaping::egress: 800mbps` to tools-static prefix T278539

28 March 2021

  • curprev 19:3119:31, 28 March 2021imported>Stashbot 202,957 bytes +127 legoktm: legoktm@tools-sgebastion-08:~$ sudo qdel -f 9999704 # T278645

27 March 2021

26 March 2021

  • curprev 12:2112:21, 26 March 2021imported>Stashbot 202,749 bytes +136 arturo: shutdown tools-package-builder-02 (stretch), we keep -03 which is buster (T275864)

25 March 2021

  • curprev 19:3019:30, 25 March 2021imported>Stashbot 202,613 bytes +909 bstorm: forced deletion of all jobs stuck in a deleting state T277653

24 March 2021

  • curprev 12:4612:46, 24 March 2021imported>Stashbot 201,704 bytes +1,273 arturo: shutoff the old stretch VMs `tools-docker-registry-03` and `tools-docker-registry-04` (T278303)

23 March 2021

  • curprev 12:4612:46, 23 March 2021imported>Stashbot 200,431 bytes +421 arturo: aborrero@tools-sgegrid-master:~$ sudo systemctl restart gridengine-master.service

18 March 2021

  • curprev 19:2419:24, 18 March 2021imported>Stashbot 200,010 bytes +868 bstorm: set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
  • curprev 01:4601:46, 18 March 2021imported>Stashbot 199,142 bytes +341 bstorm: killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)