You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Revision history of "Nova Resource:Tools/SAL"

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)
  • curprev 14:39, 12 June 2021imported>Stashbot 218,567 bytes +267 majavah: remove nonexistent tools-prometheus-04 and add tools-prometheus-05 to hiera key "prometheus_nodes"
  • curprev 17:38, 10 June 2021imported>Stashbot 218,300 bytes +104 majavah: clear error state from tools-sgeexec-0907, task@tools-sgeexec-0939
  • curprev 13:57, 9 June 2021imported>Stashbot 218,196 bytes +135 majavah: clear error state from exec nodes tools-sgeexec-0913, tools-sgeexec-0936, task@tools-sgeexec-0940
  • curprev 18:39, 7 June 2021imported>Stashbot 218,061 bytes +334 bstorm: cleaning up more error conditions on grid queues
  • curprev 21:30, 4 June 2021imported>Stashbot 217,727 bytes +193 bstorm: deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221
  • curprev 18:27, 3 June 2021imported>Stashbot 217,534 bytes +181 majavah: renew prometheus kubernetes certificate T280301
  • curprev 10:10, 1 June 2021imported>Stashbot 217,353 bytes +238 majavah: properly clean up deleted vms tools-k8s-haproxy-[1,2], tools-checker-03 from puppet after using the wrong fqdn first time
  • curprev 18:58, 30 May 2021imported>Stashbot 217,115 bytes +75 majavah: clear grid error state from 14 queues
  • curprev 18:03, 27 May 2021imported>Stashbot 217,040 bytes +283 bstorm: adjusted profile::wmcs::kubeadm::etcd_latency_ms from 30 back to the default (10)
  • curprev 10:36, 24 May 2021imported>Stashbot 216,757 bytes +230 arturo: rebased labs/private.git after merge conflict
  • curprev 14:47, 22 May 2021imported>Stashbot 216,527 bytes +389 majavah: manually remove jeh admin certificates and from maintain-kubeusers configmap T282725
  • curprev 17:06, 21 May 2021imported>Stashbot 216,138 bytes +626 majavah: unpool tooks-k8s-ingress-[4-6]
  • curprev 17:05, 20 May 2021imported>Stashbot 215,512 bytes +488 Majavah: pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221
  • curprev 12:15, 19 May 2021imported>Stashbot 215,024 bytes +263 Majavah: rollback ingress-nginx-gen2
  • curprev 16:52, 16 May 2021imported>Stashbot 214,761 bytes +136 Majavah: clear error state from tools-sgeexec-0905 tools-sgeexec-0907 tools-sgeexec-0936 tools-sgeexec-0941
  • curprev 19:18, 14 May 2021imported>Stashbot 214,625 bytes +379 bstorm: adjusting the rate limits for bastions nfs_write upward a lot to make NFS writes faster now that the cluster is finally using 10Gb on the backend and frontend T218338
  • curprev 19:45, 12 May 2021imported>Stashbot 214,246 bytes +384 bstorm: cleared error state from some queues
  • curprev 17:17, 11 May 2021imported>Stashbot 213,862 bytes +593 Majavah: shutdown and delete tools-checker-03 T278540
  • curprev 22:58, 10 May 2021imported>Stashbot 213,269 bytes +755 bstorm: cleared error state on a grid queue
  • curprev 06:55, 9 May 2021imported>Stashbot 212,514 bytes +79 Majavah: clear error state from tools-sgeexec-0916
  • curprev 10:57, 8 May 2021imported>Stashbot 212,435 bytes +214 Majavah: import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221
  • curprev 18:07, 7 May 2021imported>Stashbot 212,221 bytes +665 Majavah: generate and add k8s haproxy keepalived password (profile::toolforge::k8s::haproxy::keepalived_password) to private puppet repo
  • curprev 14:43, 6 May 2021imported>Stashbot 211,556 bytes +296 Majavah: clear error states from all currently erroring exec nodes
  • curprev 19:27, 5 May 2021imported>Stashbot 211,260 bytes +120 andrewbogott: adding taavi as a sudo root to project toolforge for T278390
  • curprev 15:23, 4 May 2021imported>Stashbot 211,140 bytes +151 arturo: upgrading exim4-daemon-heavy in tools-mail-03
  • curprev 16:24, 3 May 2021imported>Stashbot 210,989 bytes +360 dcaro: started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641)
  • curprev 18:23, 29 April 2021imported>Stashbot 210,629 bytes +178 bstorm: removing one more etcd node via cookbook T279723
  • curprev 16:40, 27 April 2021imported>Stashbot 210,451 bytes +170 bstorm: deleted all the errored out grid jobs stuck in queue wait
  • curprev 12:17, 26 April 2021imported>Stashbot 210,281 bytes +110 arturo: allowing more tools into the legacy redirector (T281003)
  • curprev 08:44, 22 April 2021imported>Stashbot 210,171 bytes +207 Krenair: Removed yuvipanda from roots sudo policy
  • curprev 22:20, 20 April 2021imported>Stashbot 209,964 bytes +818 bd808: `clush -w @all -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"`
  • curprev 10:53, 19 April 2021imported>Stashbot 209,146 bytes +205 dcaro: reverting setting prometheus data source in grafana to 'server', can't connect,
  • curprev 23:15, 16 April 2021imported>Stashbot 208,941 bytes +622 bstorm: cleaned up all source files for the grid with the old domain name to enable future node creation T277653
  • curprev 13:26, 13 April 2021imported>Stashbot 208,319 bytes +513 dcaro: upgrade puppet and python-wmflib on tools-prometheus-03
  • curprev 16:07, 11 April 2021imported>Stashbot 207,806 bytes +194 bstorm: cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936
  • curprev 18:25, 8 April 2021imported>Stashbot 207,612 bytes +706 bstorm: cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for tools-sgegrid-master and tools-sgegrid-shadow using the old fqdns T277653
  • curprev 04:35, 7 April 2021imported>Stashbot 206,906 bytes +182 andrewbogott: replacing the mx record '10 mail.tools.wmcloud.org' with '10 mail.tools.wmcloud.org.' — trying to fix axfr for the tools.wmcloud.org zone
  • curprev 15:16, 6 April 2021imported>Stashbot 206,724 bytes +1,295 bstorm: cleared queue state since a few had "errored" for failed jobs.
  • curprev 17:02, 5 April 2021imported>Stashbot 205,429 bytes +205 bstorm: chowned the data volume for the docker registry to docker-registry:docker-registry
  • curprev 20:43, 1 April 2021imported>Stashbot 205,224 bytes +555 bstorm: cleared error state from the grid queues caused by unspecified job errors
  • curprev 15:57, 31 March 2021imported>Stashbot 204,669 bytes +891 arturo: rebooting `tools-mail-03` after enabling NFS (T267082, T278538)
  • curprev 16:15, 30 March 2021imported>Stashbot 203,778 bytes +821 bstorm: added `labstore::traffic_shaping::egress: 800mbps` to tools-static prefix T278539
  • curprev 19:31, 28 March 2021imported>Stashbot 202,957 bytes +127 legoktm: legoktm@tools-sgebastion-08:~$ sudo qdel -f 9999704 # T278645
  • curprev 02:48, 27 March 2021imported>Stashbot 202,830 bytes +81 Reedy: qdel -f 9999895 9999799
  • curprev 12:21, 26 March 2021imported>Stashbot 202,749 bytes +136 arturo: shutdown tools-package-builder-02 (stretch), we keep -03 which is buster (T275864)
  • curprev 19:30, 25 March 2021imported>Stashbot 202,613 bytes +909 bstorm: forced deletion of all jobs stuck in a deleting state T277653
  • curprev 12:46, 24 March 2021imported>Stashbot 201,704 bytes +1,273 arturo: shutoff the old stretch VMs `tools-docker-registry-03` and `tools-docker-registry-04` (T278303)
  • curprev 12:46, 23 March 2021imported>Stashbot 200,431 bytes +421 arturo: aborrero@tools-sgegrid-master:~$ sudo systemctl restart gridengine-master.service
  • curprev 19:24, 18 March 2021imported>Stashbot 200,010 bytes +868 bstorm: set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
  • curprev 01:46, 18 March 2021imported>Stashbot 199,142 bytes +341 bstorm: killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
  • curprev 16:31, 16 March 2021imported>Stashbot 198,801 bytes +361 arturo: installing jobutils and misctools 1.41
  • curprev 23:13, 12 March 2021imported>Stashbot 198,440 bytes +76 bstorm: cleared error state for all grid queues
  • curprev 17:40, 11 March 2021imported>Stashbot 198,364 bytes +345 bstorm: deployed metrics-server:0.4.1 to kubernetes
  • curprev 10:56, 10 March 2021imported>Stashbot 198,019 bytes +96 arturo: briefly stopped VM tools-k8s-etcd-7 to disable VMX cpu flag
  • curprev 13:31, 9 March 2021imported>Stashbot 197,923 bytes +261 arturo: hard-reboot tools-docker-registry-04 because issues related to T276922
  • curprev 12:30, 5 March 2021imported>Stashbot 197,662 bytes +139 arturo: started tools-redis-1004 again
  • curprev 11:25, 4 March 2021imported>Stashbot 197,523 bytes +219 arturo: rebooted tools-sgewebgrid-generic-0901, repool it again
  • curprev 15:17, 3 March 2021imported>Stashbot 197,304 bytes +471 arturo: shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration
  • curprev 15:24, 2 March 2021imported>Stashbot 196,833 bytes +238 bstorm: depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right
  • curprev 02:23, 27 February 2021imported>Stashbot 196,595 bytes +252 bstorm: deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910
  • curprev 22:04, 26 February 2021imported>Stashbot 196,343 bytes +338 bstorm: cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513
  • curprev 18:30, 24 February 2021imported>Stashbot 196,005 bytes +212 bd808: `sudo wmcs-openstack role remove --user zfilipin --project tools user` T267313
  • curprev 23:11, 23 February 2021imported>Stashbot 195,793 bytes +227 bstorm: draining a bunch of k8s workers to clean up after dumps changes T272397
  • curprev 20:40, 22 February 2021imported>Stashbot 195,566 bytes +641 bstorm: repooled tools-sgeexec-0918.tools.eqiad.wmflabs
  • curprev 12:31, 19 February 2021imported>Stashbot 194,925 bytes +100 arturo: deploying new version of toolforge ingress admission controller
  • curprev 21:26, 17 February 2021imported>Stashbot 194,825 bytes +118 bstorm: deleted tools-puppetdb-01 since it is unused at this time (and undersized anyway)
  • curprev 16:27, 4 February 2021imported>Stashbot 194,707 bytes +71 bstorm: rebooting tools-package-builder-02
  • curprev 16:27, 26 January 2021imported>Stashbot 194,636 bytes +110 bd808: Hard reboot of tools-sgeexec-0906 via Horizon for T272978
  • curprev 09:59, 22 January 2021imported>Stashbot 194,526 bytes +146 dcaro: added the record redis.svc.tools.eqiad1.wikimedia.cloud pointing to tools-redis1003 (T272679)
  • curprev 23:58, 21 January 2021imported>Stashbot 194,380 bytes +102 bstorm: deployed new maintain-kubeusers to tools T271847
  • curprev 22:57, 19 January 2021imported>Stashbot 194,278 bytes +503 bstorm: truncated 75GB error log /data/project/robokobot/virgule.err T272247
  • curprev 20:56, 14 January 2021imported>Stashbot 193,775 bytes +367 bstorm: setting bastions to have mostly-uncapped egress network and 40MBps nfs_read for better shared use
  • curprev 10:02, 13 January 2021imported>Stashbot 193,408 bytes +107 arturo: delete floating IP allocation 185.15.56.245 (T271867)
  • curprev 18:16, 12 January 2021imported>Stashbot 193,301 bytes +134 bstorm: deleted wedged CSR tool-adhs-wde to get maintain-kubeusers working again T271842
  • curprev 18:49, 5 January 2021imported>Stashbot 193,167 bytes +134 bstorm: changing the limits on k8s etcd nodes again, so disabling puppet on them T267966
  • curprev 18:21, 4 January 2021imported>Stashbot 193,033 bytes +191 bstorm: ran 'sudo systemctl stop getty@ttyS1.service && sudo systemctl disable getty@ttyS1.service' on tools-k8s-etcd-5 I have no idea why that keeps coming back.
  • curprev 18:22, 22 December 2020imported>Stashbot 192,842 bytes +190 bstorm: rebooting the grid master because it is misbehaving following the NFS outage
  • curprev 18:37, 18 December 2020imported>Stashbot 192,652 bytes +109 bstorm: set profile::wmcs::kubeadm::etcd_latency_ms: 15 T267966
  • curprev 21:42, 17 December 2020imported>Stashbot 192,543 bytes +2,476 bstorm: doing the same procedure to increase the timeouts more T267966
  • curprev 18:29, 11 December 2020imported>Stashbot 190,067 bytes +1,158 bstorm: certificatesigningrequest.certificates.k8s.io "tool-production-error-tasks-metrics" deleted to stop maintain-kubeusers issues
  • curprev 17:35, 10 December 2020imported>Stashbot 188,909 bytes +1,179 bstorm: k8s-control nodes upgraded to 1.17.13 T263284
  • curprev 19:01, 8 December 2020imported>Stashbot 187,730 bytes +140 bstorm: pushed updated calico node image (v3.14.0) to internal docker registry as well T269016
  • curprev 22:56, 7 December 2020imported>Stashbot 187,590 bytes +182 bstorm: pushed updated local copies of the typha, calico-cni and calico-pod2daemon-flexvol images to the tools internal registry T269016
  • curprev 09:18, 3 December 2020imported>Stashbot 187,408 bytes +312 arturo: restarted kubelet systemd service on tools-k8s-worker-38. Node was NotReady, complaining about 'use of closed network connection'
  • curprev 23:35, 28 November 2020imported>Stashbot 187,096 bytes +326 Krenair: Re-scheduled 4 continuous jobs from tools-sgeexec-0908 as it appears to be broken, at about 23:20 UTC
  • curprev 17:44, 24 November 2020imported>Stashbot 186,770 bytes +259 arturo: rebased labs/private.git. 2 patches had merge conflicts
  • curprev 19:45, 10 November 2020imported>Stashbot 186,511 bytes +77 andrewbogott: rebooting tools-sgeexec-0950; OOM
  • curprev 13:35, 2 November 2020imported>Stashbot 186,434 bytes +127 arturo: (typo: dcaro)
  • curprev 21:33, 29 October 2020imported>Stashbot 186,307 bytes +489 legoktm: published docker-registry.tools.wmflabs.org/toolbeta-test image (T265681)
  • curprev 23:42, 28 October 2020imported>Stashbot 185,818 bytes +363 bstorm: dramatically elevated the egress cap on tools-k8s-ingress nodes that were affected by the NFS settings T266506
  • curprev 22:22, 23 October 2020imported>Stashbot 185,455 bytes +115 legoktm: imported pack_0.14.2-1_amd64.deb into buster-tools (T266270)
  • curprev 17:58, 21 October 2020imported>Stashbot 185,340 bytes +141 legoktm: pushed toolforge-buster0-{build,run}:latest images to docker registry
  • curprev 22:00, 15 October 2020imported>Stashbot 185,199 bytes +355 bstorm: manually removing nscd from tools-sgebastion-08 and running puppet
  • curprev 21:00, 14 October 2020imported>Stashbot 184,844 bytes +753 andrewbogott: repooling tools-sgewebgrid-generic-0901 and tools-sgewebgrid-lighttpd-0915
  • curprev 17:07, 10 October 2020imported>Stashbot 184,091 bytes +123 bstorm: cleared errors on tools-sgeexec-0912.tools.eqiad.wmflabs to get the queue moving again
  • curprev 17:07, 8 October 2020imported>Stashbot 183,968 bytes +103 bstorm: rebuilding docker images with locales-all T263339
  • curprev 19:04, 6 October 2020imported>Stashbot 183,865 bytes +234 andrewbogott: uncordoned tools-k8s-worker-38
  • curprev 21:09, 2 October 2020imported>Stashbot 183,631 bytes +281 bstorm: rebooting tools-k8s-worker-70 because it seems to be unable to recover from an old NFS disconnect
  • curprev 21:39, 1 October 2020imported>Stashbot 183,350 bytes +284 andrewbogott: migrating tools-proxy-06 to ceph
  • curprev 18:34, 30 September 2020imported>Stashbot 183,066 bytes +152 andrewbogott: repooling tools-sgeexec-0918
(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)