You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Revision history of "Nova Resource:Tools/SAL"

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View (newer 500 | ) (20 | 50 | 100 | 250 | 500)
  • curprev 12:18, 4 December 2021imported>Stashbot 232,301 bytes +109 majavah: deploying delete-crashing-pods in dry run mode T292925
  • curprev 17:46, 28 November 2021imported>Stashbot 232,192 bytes +165 andrewbogott: moving tools-k8s-etcd-13 to cloudvirt1020; cloudvirt1018 (its old host) has a degraded raid which is affecting performance
  • curprev 13:16, 19 November 2021imported>Stashbot 232,027 bytes +97 majavah: manually add 3 project members after ldap issues were fixed
  • curprev 12:31, 16 November 2021imported>Stashbot 231,930 bytes +197 majavah: uploading calico 3.21.0 to the internal docker registry T292698
  • curprev 10:50, 11 November 2021imported>Stashbot 231,733 bytes +107 arturo: add user `srv-networktests` as project user (T294955)
  • curprev 19:18, 5 November 2021imported>Stashbot 231,626 bytes +74 majavah: deploying registry-admission changes
  • curprev 23:58, 29 October 2021imported>Stashbot 231,552 bytes +129 andrewbogott: deleting all files older than 14 days in /srv/tools/shared/tools/project/.shared/cache
  • curprev 12:42, 28 October 2021imported>Stashbot 231,423 bytes +122 arturo: set `allow-snippet-annotations: "false"` for ingress-nginx (T294330)
  • curprev 18:00, 26 October 2021imported>Stashbot 231,301 bytes +238 majavah: deleting legacy ingresses for tools.wmflabs.org urls
  • curprev 14:33, 25 October 2021imported>Stashbot 231,063 bytes +262 majavah: copy nginx-ingress controller v1.0.4 to internal registry T292771
  • curprev 15:35, 22 October 2021imported>Stashbot 230,801 bytes +240 majavah: remove "^tools-k8s-master-[0-9]+\.tools\.eqiad\.wmflabs$" from authorized_regexes for the main certificate
  • curprev 09:48, 21 October 2021imported>Stashbot 230,561 bytes +73 majavah: deploying toolforge-webservice 0.79
  • curprev 15:41, 20 October 2021imported>Stashbot 230,488 bytes +276 majavah: removing toollabs-webservice from grid exec and master nodes where it's not needed and not managed by puppet
  • curprev 15:01, 15 October 2021imported>Stashbot 230,212 bytes +129 arturo: add updated ingress-nginx docker image in the registry (v1.0.1) for T293472
  • curprev 09:13, 7 October 2021imported>Stashbot 230,083 bytes +247 majavah: disabling settings api, now that all pod presets are gone T279106
  • curprev 06:46, 6 October 2021imported>Stashbot 229,836 bytes +154 majavah: taavi@toolserver-proxy-01:~$ sudo systemctl restart apache2.service # see if it helps with toolserver.org ssl alerts
  • curprev 21:31, 3 October 2021imported>Stashbot 229,682 bytes +254 bstorm: rebuilding buster containers since they are also affected T291387 T292355
  • curprev 21:59, 1 October 2021imported>Stashbot 229,428 bytes +347 bd808: clush -w @all -b 'sudo sed -i "s#mozilla/DST_Root_CA_X3.crt#!mozilla/DST_Root_CA_X3.crt#" /etc/ca-certificates.conf && sudo update-ca-certificates' for T292289
  • curprev 22:39, 29 September 2021imported>Stashbot 229,081 bytes +265 bstorm: finished deploy of the toollabs-webservice 0.77 and updating labels across the k8s cluster to match
  • curprev 16:19, 27 September 2021imported>Stashbot 228,816 bytes +257 majavah: deploy volume-admission fix for containers for some volumes mounted
  • curprev 17:20, 23 September 2021imported>Stashbot 228,559 bytes +118 majavah: deploying new maintain-kubeusers for lack of podpresets T279106
  • curprev 18:06, 22 September 2021imported>Stashbot 228,441 bytes +257 bstorm: launching tools-nfs-test-client-01 to run a "fair" test battery against T291406
  • curprev 12:44, 20 September 2021imported>Stashbot 228,184 bytes +130 majavah: deploying volume-admission to tools, should not affect anything yet T279106
  • curprev 08:08, 15 September 2021imported>Stashbot 228,054 bytes +67 majavah: update tools-manifest to 0.24
  • curprev 10:36, 14 September 2021imported>Stashbot 227,987 bytes +104 arturo: add toolforge-jobs-framework-cli v5 to aptly buster-tools/toolsbeta
  • curprev 08:57, 13 September 2021imported>Stashbot 227,883 bytes +291 arturo: cleared grid queues error states (T290844)
  • curprev 08:51, 11 September 2021imported>Stashbot 227,592 bytes +63 majavah: depool tools-sgeexec-0907
  • curprev 23:26, 10 September 2021imported>Stashbot 227,529 bytes +359 bstorm: cleared error state for tools-sgeexec-0907.tools.eqiad.wmflabs
  • curprev 16:20, 9 September 2021imported>Stashbot 227,170 bytes +155 arturo: 70017ec0ac root@tools-k8s-control-3:~# kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml
  • curprev 15:27, 7 September 2021imported>Stashbot 227,015 bytes +178 majavah: rolling out python3-prometheus-client updates
  • curprev 16:31, 6 September 2021imported>Stashbot 226,837 bytes +132 arturo: deploying jobs-framework-cli v4
  • curprev 22:36, 3 September 2021imported>Stashbot 226,705 bytes +148 bstorm: backfilling quotas in screen for T286784
  • curprev 01:02, 2 September 2021imported>Stashbot 226,557 bytes +140 bstorm: deployed new version of maintain-kubeusers with new count quotas for new tools T286784
  • curprev 19:10, 20 August 2021imported>Stashbot 226,417 bytes +236 majavah: rebuilding node12-sssd/{base,web} to use debian packaged npm 7
  • curprev 21:32, 18 August 2021imported>Stashbot 226,181 bytes +203 bstorm: rebooted tools-sgecron-01 due to a ram filling up and killing everything
  • curprev 17:00, 16 August 2021imported>Stashbot 225,978 bytes +316 majavah: remove and re-add toollabs-webservice 0.75 on stretch-toolsbeta repository
  • curprev 17:30, 15 August 2021imported>Stashbot 225,662 bytes +546 majavah: deploying update jobs-framework-api container list to include bullseye images
  • curprev 16:59, 12 August 2021imported>Stashbot 225,116 bytes +377 bstorm: deployed updated manifest for ingress-admission
  • curprev 05:59, 7 August 2021imported>Stashbot 224,739 bytes +134 majavah: restart nginx on toolserver-proxy-01 if that helps with flapping icinga certificate expiry check
  • curprev 16:17, 6 August 2021imported>Stashbot 224,605 bytes +104 bstorm: failed over to tools-docker-registry-06 (which has more space) T288229
  • curprev 00:43, 6 August 2021imported>Stashbot 224,501 bytes +430 bstorm: set up sync between the new registry host and the existing one T288229
  • curprev 18:04, 29 July 2021imported>Stashbot 224,071 bytes +133 majavah: reset sul account mapping on striker for developer account "Derek Zax" T287369
  • curprev 21:33, 28 July 2021imported>Stashbot 223,938 bytes +111 majavah: add mdipietro as projectadmin and to sudo policy T287287
  • curprev 16:20, 27 July 2021imported>Stashbot 223,827 bytes +84 bstorm: built new php images with python2 on board T287421
  • curprev 00:04, 27 July 2021imported>Stashbot 223,743 bytes +381 bstorm: deploy a version of the php3.7 web image that includes the python2 package with tag :testing T287421
  • curprev 07:15, 23 July 2021imported>Stashbot 223,362 bytes +109 majavah: restart nginx on tools-static-14 to see if it helps with fontcdn issues
  • curprev 23:35, 22 July 2021imported>Stashbot 223,253 bytes +336 bstorm: deleted tools-sgebastion-09 since it has been shut off since March anyway
  • curprev 20:01, 21 July 2021imported>Stashbot 222,917 bytes +817 bstorm: deployed new maintain-kubeusers to toolforge T285011
  • curprev 18:42, 20 July 2021imported>Stashbot 222,100 bytes +451 majavah: deploying systemd security tools on toolforge public stretch machines T287004
  • curprev 23:24, 19 July 2021imported>Stashbot 221,649 bytes +248 bstorm: applied matchPolicy: equivalent to tools ingress validation controller T280360
  • curprev 14:04, 16 July 2021imported>Stashbot 221,401 bytes +352 arturo: deployed jobs-framework-api 42b7a885a5bc1bf00c300e8d77bd92e1430a8327 (T286132)
  • curprev 16:12, 15 July 2021imported>Stashbot 221,049 bytes +417 arturo: deploy toolforge-jobs-framework-api git version d85d93ee1c5d4be6a526cf83e806b2679dde3875 (T285944, T286107, T285979, T286485, T286107)
  • curprev 23:29, 14 July 2021imported>Stashbot 220,632 bytes +250 bstorm: mounted nfs on tools-services-05 and backing up aptly to NFS dir T286003
  • curprev 16:56, 12 July 2021imported>Stashbot 220,382 bytes +143 bstorm: deleted job 4720371 due to LDAP failure
  • curprev 18:46, 2 July 2021imported>Stashbot 220,239 bytes +99 bstorm: cleared error state for tools-sgeexec-0940.tools.eqiad.wmflabs
  • curprev 22:08, 1 July 2021imported>Stashbot 220,140 bytes +445 bstorm: releasing webservice 0.75
  • curprev 21:58, 29 June 2021imported>Stashbot 219,695 bytes +594 bstorm: clearing one errored queue and a stack of discarded jobs
  • curprev 19:02, 15 June 2021imported>Stashbot 219,101 bytes +181 bstorm: cleared error status from a few queues
  • curprev 22:21, 14 June 2021imported>Stashbot 218,920 bytes +229 bstorm: push docker-registry.tools.wmflabs.org/toolforge-python37-sssd-web:testing to test staged os.execv (and other patches) using toolsbeta toollabs-webservice version 0.75 T282975
  • curprev 08:15, 13 June 2021imported>Stashbot 218,691 bytes +124 majavah: clear grid error state from tools-sgeexec-0907, tools-sgeexec-0916, tools-sgeexec-0940
  • curprev 14:39, 12 June 2021imported>Stashbot 218,567 bytes +267 majavah: remove nonexistent tools-prometheus-04 and add tools-prometheus-05 to hiera key "prometheus_nodes"
  • curprev 17:38, 10 June 2021imported>Stashbot 218,300 bytes +104 majavah: clear error state from tools-sgeexec-0907, task@tools-sgeexec-0939
  • curprev 13:57, 9 June 2021imported>Stashbot 218,196 bytes +135 majavah: clear error state from exec nodes tools-sgeexec-0913, tools-sgeexec-0936, task@tools-sgeexec-0940
  • curprev 18:39, 7 June 2021imported>Stashbot 218,061 bytes +334 bstorm: cleaning up more error conditions on grid queues
  • curprev 21:30, 4 June 2021imported>Stashbot 217,727 bytes +193 bstorm: deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221
  • curprev 18:27, 3 June 2021imported>Stashbot 217,534 bytes +181 majavah: renew prometheus kubernetes certificate T280301
  • curprev 10:10, 1 June 2021imported>Stashbot 217,353 bytes +238 majavah: properly clean up deleted vms tools-k8s-haproxy-[1,2], tools-checker-03 from puppet after using the wrong fqdn first time
  • curprev 18:58, 30 May 2021imported>Stashbot 217,115 bytes +75 majavah: clear grid error state from 14 queues
  • curprev 18:03, 27 May 2021imported>Stashbot 217,040 bytes +283 bstorm: adjusted profile::wmcs::kubeadm::etcd_latency_ms from 30 back to the default (10)
  • curprev 10:36, 24 May 2021imported>Stashbot 216,757 bytes +230 arturo: rebased labs/private.git after merge conflict
  • curprev 14:47, 22 May 2021imported>Stashbot 216,527 bytes +389 majavah: manually remove jeh admin certificates and from maintain-kubeusers configmap T282725
  • curprev 17:06, 21 May 2021imported>Stashbot 216,138 bytes +626 majavah: unpool tooks-k8s-ingress-[4-6]
  • curprev 17:05, 20 May 2021imported>Stashbot 215,512 bytes +488 Majavah: pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221
  • curprev 12:15, 19 May 2021imported>Stashbot 215,024 bytes +263 Majavah: rollback ingress-nginx-gen2
  • curprev 16:52, 16 May 2021imported>Stashbot 214,761 bytes +136 Majavah: clear error state from tools-sgeexec-0905 tools-sgeexec-0907 tools-sgeexec-0936 tools-sgeexec-0941
  • curprev 19:18, 14 May 2021imported>Stashbot 214,625 bytes +379 bstorm: adjusting the rate limits for bastions nfs_write upward a lot to make NFS writes faster now that the cluster is finally using 10Gb on the backend and frontend T218338
  • curprev 19:45, 12 May 2021imported>Stashbot 214,246 bytes +384 bstorm: cleared error state from some queues
  • curprev 17:17, 11 May 2021imported>Stashbot 213,862 bytes +593 Majavah: shutdown and delete tools-checker-03 T278540
  • curprev 22:58, 10 May 2021imported>Stashbot 213,269 bytes +755 bstorm: cleared error state on a grid queue
  • curprev 06:55, 9 May 2021imported>Stashbot 212,514 bytes +79 Majavah: clear error state from tools-sgeexec-0916
  • curprev 10:57, 8 May 2021imported>Stashbot 212,435 bytes +214 Majavah: import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221
  • curprev 18:07, 7 May 2021imported>Stashbot 212,221 bytes +665 Majavah: generate and add k8s haproxy keepalived password (profile::toolforge::k8s::haproxy::keepalived_password) to private puppet repo
  • curprev 14:43, 6 May 2021imported>Stashbot 211,556 bytes +296 Majavah: clear error states from all currently erroring exec nodes
  • curprev 19:27, 5 May 2021imported>Stashbot 211,260 bytes +120 andrewbogott: adding taavi as a sudo root to project toolforge for T278390
  • curprev 15:23, 4 May 2021imported>Stashbot 211,140 bytes +151 arturo: upgrading exim4-daemon-heavy in tools-mail-03
  • curprev 16:24, 3 May 2021imported>Stashbot 210,989 bytes +360 dcaro: started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641)
  • curprev 18:23, 29 April 2021imported>Stashbot 210,629 bytes +178 bstorm: removing one more etcd node via cookbook T279723
  • curprev 16:40, 27 April 2021imported>Stashbot 210,451 bytes +170 bstorm: deleted all the errored out grid jobs stuck in queue wait
  • curprev 12:17, 26 April 2021imported>Stashbot 210,281 bytes +110 arturo: allowing more tools into the legacy redirector (T281003)
  • curprev 08:44, 22 April 2021imported>Stashbot 210,171 bytes +207 Krenair: Removed yuvipanda from roots sudo policy
  • curprev 22:20, 20 April 2021imported>Stashbot 209,964 bytes +818 bd808: `clush -w @all -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"`
  • curprev 10:53, 19 April 2021imported>Stashbot 209,146 bytes +205 dcaro: reverting setting prometheus data source in grafana to 'server', can't connect,
  • curprev 23:15, 16 April 2021imported>Stashbot 208,941 bytes +622 bstorm: cleaned up all source files for the grid with the old domain name to enable future node creation T277653
  • curprev 13:26, 13 April 2021imported>Stashbot 208,319 bytes +513 dcaro: upgrade puppet and python-wmflib on tools-prometheus-03
  • curprev 16:07, 11 April 2021imported>Stashbot 207,806 bytes +194 bstorm: cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936
  • curprev 18:25, 8 April 2021imported>Stashbot 207,612 bytes +706 bstorm: cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for tools-sgegrid-master and tools-sgegrid-shadow using the old fqdns T277653
  • curprev 04:35, 7 April 2021imported>Stashbot 206,906 bytes +182 andrewbogott: replacing the mx record '10 mail.tools.wmcloud.org' with '10 mail.tools.wmcloud.org.' — trying to fix axfr for the tools.wmcloud.org zone
  • curprev 15:16, 6 April 2021imported>Stashbot 206,724 bytes +1,295 bstorm: cleared queue state since a few had "errored" for failed jobs.
  • curprev 17:02, 5 April 2021imported>Stashbot 205,429 bytes +205 bstorm: chowned the data volume for the docker registry to docker-registry:docker-registry
  • curprev 20:43, 1 April 2021imported>Stashbot 205,224 bytes +555 bstorm: cleared error state from the grid queues caused by unspecified job errors
  • curprev 15:57, 31 March 2021imported>Stashbot 204,669 bytes +891 arturo: rebooting `tools-mail-03` after enabling NFS (T267082, T278538)
  • curprev 16:15, 30 March 2021imported>Stashbot 203,778 bytes +821 bstorm: added `labstore::traffic_shaping::egress: 800mbps` to tools-static prefix T278539
  • curprev 19:31, 28 March 2021imported>Stashbot 202,957 bytes +127 legoktm: legoktm@tools-sgebastion-08:~$ sudo qdel -f 9999704 # T278645
  • curprev 02:48, 27 March 2021imported>Stashbot 202,830 bytes +81 Reedy: qdel -f 9999895 9999799
  • curprev 12:21, 26 March 2021imported>Stashbot 202,749 bytes +136 arturo: shutdown tools-package-builder-02 (stretch), we keep -03 which is buster (T275864)
  • curprev 19:30, 25 March 2021imported>Stashbot 202,613 bytes +909 bstorm: forced deletion of all jobs stuck in a deleting state T277653
  • curprev 12:46, 24 March 2021imported>Stashbot 201,704 bytes +1,273 arturo: shutoff the old stretch VMs `tools-docker-registry-03` and `tools-docker-registry-04` (T278303)
  • curprev 12:46, 23 March 2021imported>Stashbot 200,431 bytes +421 arturo: aborrero@tools-sgegrid-master:~$ sudo systemctl restart gridengine-master.service
  • curprev 19:24, 18 March 2021imported>Stashbot 200,010 bytes +868 bstorm: set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
  • curprev 01:46, 18 March 2021imported>Stashbot 199,142 bytes +341 bstorm: killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
  • curprev 16:31, 16 March 2021imported>Stashbot 198,801 bytes +361 arturo: installing jobutils and misctools 1.41
  • curprev 23:13, 12 March 2021imported>Stashbot 198,440 bytes +76 bstorm: cleared error state for all grid queues
  • curprev 17:40, 11 March 2021imported>Stashbot 198,364 bytes +345 bstorm: deployed metrics-server:0.4.1 to kubernetes
  • curprev 10:56, 10 March 2021imported>Stashbot 198,019 bytes +96 arturo: briefly stopped VM tools-k8s-etcd-7 to disable VMX cpu flag
  • curprev 13:31, 9 March 2021imported>Stashbot 197,923 bytes +261 arturo: hard-reboot tools-docker-registry-04 because issues related to T276922
  • curprev 12:30, 5 March 2021imported>Stashbot 197,662 bytes +139 arturo: started tools-redis-1004 again
  • curprev 11:25, 4 March 2021imported>Stashbot 197,523 bytes +219 arturo: rebooted tools-sgewebgrid-generic-0901, repool it again
  • curprev 15:17, 3 March 2021imported>Stashbot 197,304 bytes +471 arturo: shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration
  • curprev 15:24, 2 March 2021imported>Stashbot 196,833 bytes +238 bstorm: depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right
  • curprev 02:23, 27 February 2021imported>Stashbot 196,595 bytes +252 bstorm: deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910
  • curprev 22:04, 26 February 2021imported>Stashbot 196,343 bytes +338 bstorm: cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513
  • curprev 18:30, 24 February 2021imported>Stashbot 196,005 bytes +212 bd808: `sudo wmcs-openstack role remove --user zfilipin --project tools user` T267313
  • curprev 23:11, 23 February 2021imported>Stashbot 195,793 bytes +227 bstorm: draining a bunch of k8s workers to clean up after dumps changes T272397
  • curprev 20:40, 22 February 2021imported>Stashbot 195,566 bytes +641 bstorm: repooled tools-sgeexec-0918.tools.eqiad.wmflabs
  • curprev 12:31, 19 February 2021imported>Stashbot 194,925 bytes +100 arturo: deploying new version of toolforge ingress admission controller
  • curprev 21:26, 17 February 2021imported>Stashbot 194,825 bytes +118 bstorm: deleted tools-puppetdb-01 since it is unused at this time (and undersized anyway)
  • curprev 16:27, 4 February 2021imported>Stashbot 194,707 bytes +71 bstorm: rebooting tools-package-builder-02
  • curprev 16:27, 26 January 2021imported>Stashbot 194,636 bytes +110 bd808: Hard reboot of tools-sgeexec-0906 via Horizon for T272978
  • curprev 09:59, 22 January 2021imported>Stashbot 194,526 bytes +146 dcaro: added the record redis.svc.tools.eqiad1.wikimedia.cloud pointing to tools-redis1003 (T272679)
  • curprev 23:58, 21 January 2021imported>Stashbot 194,380 bytes +102 bstorm: deployed new maintain-kubeusers to tools T271847
  • curprev 22:57, 19 January 2021imported>Stashbot 194,278 bytes +503 bstorm: truncated 75GB error log /data/project/robokobot/virgule.err T272247
  • curprev 20:56, 14 January 2021imported>Stashbot 193,775 bytes +367 bstorm: setting bastions to have mostly-uncapped egress network and 40MBps nfs_read for better shared use
  • curprev 10:02, 13 January 2021imported>Stashbot 193,408 bytes +107 arturo: delete floating IP allocation 185.15.56.245 (T271867)
  • curprev 18:16, 12 January 2021imported>Stashbot 193,301 bytes +134 bstorm: deleted wedged CSR tool-adhs-wde to get maintain-kubeusers working again T271842
  • curprev 18:49, 5 January 2021imported>Stashbot 193,167 bytes +134 bstorm: changing the limits on k8s etcd nodes again, so disabling puppet on them T267966
  • curprev 18:21, 4 January 2021imported>Stashbot 193,033 bytes +191 bstorm: ran 'sudo systemctl stop getty@ttyS1.service && sudo systemctl disable getty@ttyS1.service' on tools-k8s-etcd-5 I have no idea why that keeps coming back.
  • curprev 18:22, 22 December 2020imported>Stashbot 192,842 bytes +190 bstorm: rebooting the grid master because it is misbehaving following the NFS outage
  • curprev 18:37, 18 December 2020imported>Stashbot 192,652 bytes +109 bstorm: set profile::wmcs::kubeadm::etcd_latency_ms: 15 T267966
  • curprev 21:42, 17 December 2020imported>Stashbot 192,543 bytes +2,476 bstorm: doing the same procedure to increase the timeouts more T267966
  • curprev 18:29, 11 December 2020imported>Stashbot 190,067 bytes +1,158 bstorm: certificatesigningrequest.certificates.k8s.io "tool-production-error-tasks-metrics" deleted to stop maintain-kubeusers issues
  • curprev 17:35, 10 December 2020imported>Stashbot 188,909 bytes +1,179 bstorm: k8s-control nodes upgraded to 1.17.13 T263284
  • curprev 19:01, 8 December 2020imported>Stashbot 187,730 bytes +140 bstorm: pushed updated calico node image (v3.14.0) to internal docker registry as well T269016
  • curprev 22:56, 7 December 2020imported>Stashbot 187,590 bytes +182 bstorm: pushed updated local copies of the typha, calico-cni and calico-pod2daemon-flexvol images to the tools internal registry T269016
  • curprev 09:18, 3 December 2020imported>Stashbot 187,408 bytes +312 arturo: restarted kubelet systemd service on tools-k8s-worker-38. Node was NotReady, complaining about 'use of closed network connection'
  • curprev 23:35, 28 November 2020imported>Stashbot 187,096 bytes +326 Krenair: Re-scheduled 4 continuous jobs from tools-sgeexec-0908 as it appears to be broken, at about 23:20 UTC
  • curprev 17:44, 24 November 2020imported>Stashbot 186,770 bytes +259 arturo: rebased labs/private.git. 2 patches had merge conflicts
  • curprev 19:45, 10 November 2020imported>Stashbot 186,511 bytes +77 andrewbogott: rebooting tools-sgeexec-0950; OOM
  • curprev 13:35, 2 November 2020imported>Stashbot 186,434 bytes +127 arturo: (typo: dcaro)
  • curprev 21:33, 29 October 2020imported>Stashbot 186,307 bytes +489 legoktm: published docker-registry.tools.wmflabs.org/toolbeta-test image (T265681)
  • curprev 23:42, 28 October 2020imported>Stashbot 185,818 bytes +363 bstorm: dramatically elevated the egress cap on tools-k8s-ingress nodes that were affected by the NFS settings T266506
  • curprev 22:22, 23 October 2020imported>Stashbot 185,455 bytes +115 legoktm: imported pack_0.14.2-1_amd64.deb into buster-tools (T266270)
  • curprev 17:58, 21 October 2020imported>Stashbot 185,340 bytes +141 legoktm: pushed toolforge-buster0-{build,run}:latest images to docker registry
  • curprev 22:00, 15 October 2020imported>Stashbot 185,199 bytes +355 bstorm: manually removing nscd from tools-sgebastion-08 and running puppet
  • curprev 21:00, 14 October 2020imported>Stashbot 184,844 bytes +753 andrewbogott: repooling tools-sgewebgrid-generic-0901 and tools-sgewebgrid-lighttpd-0915
  • curprev 17:07, 10 October 2020imported>Stashbot 184,091 bytes +123 bstorm: cleared errors on tools-sgeexec-0912.tools.eqiad.wmflabs to get the queue moving again
  • curprev 17:07, 8 October 2020imported>Stashbot 183,968 bytes +103 bstorm: rebuilding docker images with locales-all T263339
  • curprev 19:04, 6 October 2020imported>Stashbot 183,865 bytes +234 andrewbogott: uncordoned tools-k8s-worker-38
  • curprev 21:09, 2 October 2020imported>Stashbot 183,631 bytes +281 bstorm: rebooting tools-k8s-worker-70 because it seems to be unable to recover from an old NFS disconnect
  • curprev 21:39, 1 October 2020imported>Stashbot 183,350 bytes +284 andrewbogott: migrating tools-proxy-06 to ceph
  • curprev 18:34, 30 September 2020imported>Stashbot 183,066 bytes +152 andrewbogott: repooling tools-sgeexec-0918
  • curprev 21:38, 23 September 2020imported>Stashbot 182,914 bytes +111 bstorm: ran an 'apt clean' across the fleet to get ahead of the new locale install
  • curprev 19:41, 18 September 2020imported>Stashbot 182,803 bytes +1,384 andrewbogott: repooling tools-k8s-worker-30, 33, 34, 57, 60
  • curprev 01:00, 18 September 2020imported>Stashbot 181,419 bytes +1,961 andrewbogott: depooling tools-sgeexec-0917, tools-sgeexec-0918, tools-sgeexec-0919, tools-sgeexec-0920 for flavor update
  • curprev 23:20, 16 September 2020imported>Stashbot 179,458 bytes +512 andrewbogott: repooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph
  • curprev 15:37, 10 September 2020imported>Stashbot 178,946 bytes +359 arturo: hard-rebooting tools-proxy-05
  • curprev 11:12, 9 September 2020imported>Stashbot 178,587 bytes +560 arturo: new ingress nodes added to the cluster, and tainted/labeled per the docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying#ingress_nodes (T250172)
  • curprev 23:24, 8 September 2020imported>Stashbot 178,027 bytes +144 bstorm: clearing grid queue error states blocking job runs
  • curprev 18:13, 2 September 2020imported>Stashbot 177,883 bytes +135 andrewbogott: moving tools-sgeexec-0920 to ceph
  • curprev 19:58, 31 August 2020imported>Stashbot 177,748 bytes +494 andrewbogott: migrating tools-sgeexec-091[0-9] to ceph
  • curprev 00:57, 30 August 2020imported>Stashbot 177,254 bytes +680 Krenair: also ran qconf -ds on each
  • curprev 21:08, 26 August 2020imported>Stashbot 176,574 bytes +293 bd808: Disabled puppet on tools-proxy-06 to test fixes for a bug in the new T251628 code
  • curprev 19:38, 25 August 2020imported>Stashbot 176,281 bytes +648 andrewbogott: deleting tools-sgeexec-0943.tools.eqiad.wmflabs, tools-sgeexec-0944.tools.eqiad.wmflabs, tools-sgeexec-0945.tools.eqiad.wmflabs, tools-sgeexec-0946.tools.eqiad.wmflabs, tools-sgeexec-0948.tools.eqiad.wmflabs, tools-sgeexec-0949.tools.eqiad.wmflabs, tools-sgeexec-0953.tools.eqiad.wmflabs — they are broken and we're not very curious why; will retry this exercise when everything is standardized on
  • curprev 21:29, 19 August 2020imported>Stashbot 175,633 bytes +440 andrewbogott: shutting down and removing tools-k8s-worker-20 through tools-k8s-worker-29; this load can now be handled by new nodes on ceph hosts
  • curprev 15:24, 18 August 2020imported>Stashbot 175,193 bytes +117 bd808: Rebuilding all Docker containers to pick up newest versions of installed packages
  • curprev 16:28, 30 July 2020imported>Stashbot 175,076 bytes +152 andrewbogott: added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663
  • curprev 23:24, 29 July 2020imported>Stashbot 174,924 bytes +216 bd808: Pushed a copy of docker-registry.wikimedia.org/wikimedia-jessie:latest to docker-registry.tools.wmflabs.org/wikimedia-jessie:latest in preparation for the upstream image going away
  • curprev 22:33, 24 July 2020imported>Stashbot 174,708 bytes +426 bd808: Removed a few more ancient docker images: grrrit, jessie-toollabs, and nagf
  • curprev 23:24, 22 July 2020imported>Stashbot 174,282 bytes +1,162 bstorm: created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663
  • curprev 16:09, 21 July 2020imported>Stashbot 173,120 bytes +212 bstorm: rebooting tools-sgegrid-shadow to remount NFS correctly
  • curprev 16:47, 17 July 2020imported>Stashbot 172,908 bytes +235 bd808: Enabled Puppet on tools-proxy-06 following successful test (T102367)
  • curprev 23:11, 15 July 2020imported>Stashbot 172,673 bytes +117 bd808: Removed ssh root key for valhallasw from project hiera (T255697)
  • curprev 18:53, 9 July 2020imported>Stashbot 172,556 bytes +115 bd808: Updating git-review to 1.27 via clush across cluster (T257496)
  • curprev 11:16, 8 July 2020imported>Stashbot 172,441 bytes +299 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/610029 -- important change to front-proxy (T234617)
  • curprev 23:22, 7 July 2020imported>Stashbot 172,142 bytes +655 bd808: Rebuilding all Docker images to pick up webservice v0.73 (T234617, T257229)
  • curprev 11:54, 6 July 2020imported>Stashbot 171,487 bytes +354 arturo: briefly point DNS tools.wmflabs.org A record to 185.15.56.60 (tools-legacy-redirector) and then switch back to 185.15.56.11 (tools-proxy-05). The legacy redirector does HTTP/307 (T247236)
  • curprev 11:19, 1 July 2020imported>Stashbot 171,133 bytes +215 arturo: cleanup exim email queue (4 frozen messages)
  • curprev 11:18, 30 June 2020imported>Stashbot 170,918 bytes +123 arturo: set some hiera keys for mtail in puppet prefix `tools-mail` (T256737)
  • curprev 22:48, 29 June 2020imported>Stashbot 170,795 bytes +309 legoktm: built html-sssd/web image (T241817)
  • curprev 21:50, 25 June 2020imported>Stashbot 170,486 bytes +283 zhuyifei1999_: re-enabling puppet on tools-sgebastion-09 T256426
  • curprev 12:36, 24 June 2020imported>Stashbot 170,203 bytes +252 arturo: live-hacking puppetmaster with exim prometheus stuff (T175964)
  • curprev 17:55, 23 June 2020imported>Stashbot 169,951 bytes +237 arturo: killed procs for users `hamishz` and `msyn` which apparently were tools that should be running in the grid / kubernetes instead
  • curprev 10:40, 17 June 2020imported>Stashbot 169,714 bytes +162 arturo: created VM tools-legacy-redirector, with the corresponding puppet prefix (T247236, T234617)
  • curprev 23:01, 16 June 2020imported>Stashbot 169,552 bytes +357 bd808: Building new Docker images to pick up webservice 0.72
  • curprev 21:28, 15 June 2020imported>Stashbot 169,195 bytes +347 bstorm_: cleaned up killgridjobs.sh on the tools bastions T157792
  • curprev 13:13, 12 June 2020imported>Stashbot 168,848 bytes +192 arturo: live-hacking session in the puppetmaster ended
  • curprev 00:16, 12 June 2020imported>Stashbot 168,656 bytes +227 bstorm_: remounted NFS for tools-k8s-control-3 and tools-acme-chief-01
  • curprev 13:32, 4 June 2020imported>Stashbot 168,429 bytes +104 bd808: Manually restored /etc/haproxy/conf.d/elastic.cfg on tools-elastic-*
  • curprev 12:23, 2 June 2020imported>Stashbot 168,325 bytes +441 arturo: renewed TLS cert for k8s metrics-server (T250874) following docs: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Certificates#internal_API_access
  • curprev 23:51, 1 June 2020imported>Stashbot 167,884 bytes +112 bstorm_: refreshed certs for the custom webhook controllers on the k8s cluster T250874
  • curprev 00:39, 1 June 2020imported>Stashbot 167,772 bytes +206 bd808: Ugh. Prior SAL message was about tools-sgeexec-0940
  • curprev 19:37, 29 May 2020imported>Stashbot 167,566 bytes +160 bstorm_: adding docker image for paws-public docker-registry.tools.wmflabs.org/paws-public-nginx:openresty T252217
  • curprev 21:19, 28 May 2020imported>Stashbot 167,406 bytes +953 bd808: Killed 7 python processes run by user 'mattho69' on login.toolforge.org
  • curprev 17:23, 27 May 2020imported>Stashbot 166,453 bytes +160 bstorm_: deleting "tools-k8s-worker-20", "tools-k8s-worker-19", "tools-k8s-worker-18", "tools-k8s-worker-17", "tools-k8s-worker-16"
  • curprev 18:45, 26 May 2020imported>Stashbot 166,293 bytes +242 bstorm_: upgrading maintain-kubeusers to match what is in toolsbeta T246059 T211096
  • curprev 20:00, 22 May 2020imported>Stashbot 166,051 bytes +227 bstorm_: rebooted tools-sgebastion-07 to clear up tmp file problems with 10 min warning
  • curprev 22:40, 21 May 2020imported>Stashbot 165,824 bytes +285 bd808: Rebuilding all Docker containers for tools-webservice 0.70 (T252700)
  • curprev 09:59, 20 May 2020imported>Stashbot 165,539 bytes +896 arturo: now running tesseract-ocr v4.1.1-2~bpo9+1 in the Toolforge grid (T247422)
  • curprev 17:00, 19 May 2020imported>Stashbot 164,643 bytes +171 bstorm_: deleting/restarting the paws db-proxy pod because it cannot connect to the replicas...and I'm hoping that's due to depooling and such
  • curprev 18:14, 13 May 2020imported>Stashbot 164,472 bytes +254 bstorm_: upgrading calico to 3.14.0 with typha enabled in Toolforge K8s T250863
  • curprev 00:28, 9 May 2020imported>Stashbot 164,218 bytes +332 bstorm_: added nfs.* to ignored_fs_types for the prometheus::node_exporter params in project hiera T252260
  • curprev 21:51, 7 May 2020imported>Stashbot 163,886 bytes +245 bstorm_: rebuilding the docker images for Toolforge k8s
  • curprev 21:20, 6 May 2020imported>Stashbot 163,641 bytes +509 bd808: Kubectl delete node tools-k8s-worker-[16-20] (T248702)
  • curprev 00:01, 6 May 2020imported>Stashbot 163,132 bytes +444 bd808: Joining tools-k8s-worker-60 to the k8s worker pool
  • curprev 22:08, 4 May 2020imported>Stashbot 162,688 bytes +346 bstorm_: deleting tools-elastic-01/2/3 T236606
  • curprev 22:13, 29 April 2020imported>Stashbot 162,342 bytes +452 bstorm_: running a fixup script after fixing a bug T247455
  • curprev 22:58, 28 April 2020imported>Stashbot 161,890 bytes +131 bstorm_: rebuilding docker-registry.tools.wmflabs.org/maintain-kubeusers:beta T247455
  • curprev 19:22, 23 April 2020imported>Stashbot 161,759 bytes +92 bd808: Increased Kubernetes services quota for bd808-test tool.
  • curprev 23:06, 21 April 2020imported>Stashbot 161,667 bytes +386 bstorm_: repooled tools-k8s-worker-38/52, tools-sgewebgrid-lighttpd-0918/9 and tools-sgeexec-0901 T250869
  • curprev 15:31, 20 April 2020imported>Stashbot 161,281 bytes +607 bd808: Rebuilding Docker containers to pick up tools-webservice v0.68 (T250625)
  • curprev 23:20, 15 April 2020imported>Stashbot 160,674 bytes +253 bd808: Building ruby25-sssd/base and children (T141388, T250118)
  • curprev 18:26, 14 April 2020imported>Stashbot 160,421 bytes +316 bstorm_: Deployed new code and RBAC for maintain-kubeusers T246123
  • curprev 21:33, 10 April 2020imported>Stashbot 160,105 bytes +369 bd808: Rebuilding all Docker images for the Kubernetes cluster (T249843)
  • curprev 15:13, 9 April 2020imported>Stashbot 159,736 bytes +522 bd808: Rebuilding all stretch and buster Docker images. Jessie is broken at the moment due to package version mismatches
  • curprev 00:20, 9 April 2020imported>Stashbot 159,214 bytes +450 bd808: Docker rebuild failed in toolforge-python2-sssd-base: "zlib1g-dev : Depends: zlib1g (= 1:1.2.8.dfsg-2+b1) but 1:1.2.8.dfsg-2+deb8u1 is to be installed"
  • curprev 20:06, 7 April 2020imported>Stashbot 158,764 bytes +161 andrewbogott: sss_cache -E on tools-sgebastion-08 and tools-sgebastion-09
  • curprev 19:16, 6 April 2020imported>Stashbot 158,603 bytes +89 bstorm_: deleted tools-redis-1001/2 T248929
  • curprev 22:40, 3 April 2020imported>Stashbot 158,514 bytes +572 bstorm_: shut down tools-redis-1001/2 T248929
  • curprev 18:28, 30 March 2020imported>Stashbot 157,942 bytes +316 bstorm_: Beginning rolling depool, remount, repool of k8s workers for T248702
  • curprev 21:22, 27 March 2020imported>Stashbot 157,626 bytes +374 bstorm_: removed puppet prefix tools-docker-builder T248703
  • curprev 11:44, 24 March 2020imported>Stashbot 157,252 bytes +427 arturo: trying to solve a rebase/merge conflict in labs/private.git in tools-puppetmaster-02
  • curprev 19:07, 18 March 2020imported>Stashbot 156,825 bytes +730 bstorm_: removed role::toollabs::logging::sender from project puppet (it wouldn't work anyway)
  • curprev 13:29, 17 March 2020imported>Stashbot 156,095 bytes +113 arturo: set `profile::toolforge::bastion::nproc: 200` for tools-sgebastion-08 (T219070)
  • curprev 00:08, 17 March 2020imported>Stashbot 155,982 bytes +357 bstorm_: shut off tools-flannel-etcd-01/02/03 T246689
  • curprev 17:00, 11 March 2020imported>Stashbot 155,625 bytes +75 jeh: clean up apt cache on tools-sgebastion-07
  • curprev 16:25, 6 March 2020imported>Stashbot 155,550 bytes +100 bstorm_: updating maintain-kubeusers image to filter invalid tool names
  • curprev 18:16, 3 March 2020imported>Stashbot 155,450 bytes +597 jeh: create OpenStack DNS record for elasticsearch.svc.tools.eqiad1.wikimedia.cloud (eqiad1 subdomain change) T236606
  • curprev 22:26, 2 March 2020imported>Stashbot 154,853 bytes +125 jeh: starting first pass of elasticsearch data migration to new cluster T236606
  • curprev 01:48, 1 March 2020imported>Stashbot 154,728 bytes +330 bstorm_: old version of kubectl removed. Anyone who needs it can download it with `curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl`
  • curprev 22:14, 28 February 2020imported>Stashbot 154,398 bytes +2,223 bstorm_: shutting down the old maintain-kubeusers and taking the gloves off the new one (removing --gentle-mode)
  • curprev 00:50, 28 February 2020imported>Stashbot 152,175 bytes +1,873 bstorm_: rebuilt all docker images to include webservice 0.64
  • curprev 00:29, 27 February 2020imported>Stashbot 150,302 bytes +948 bd808: Drained tools-worker-1009 for reboot (NFS flakey)
  • curprev 15:31, 25 February 2020imported>Stashbot 149,354 bytes +82 bd808: `wmcs-k8s-enable-cluster-monitor toolschecker`
  • curprev 00:40, 23 February 2020imported>Stashbot 149,272 bytes +62 Krenair: T245932
  • curprev 16:02, 21 February 2020imported>Stashbot 149,210 bytes +83 andrewbogott: moving tools-sgecron-01 to cloudvirt1022
  • curprev 14:49, 20 February 2020imported>Stashbot 149,127 bytes +117 andrewbogott: moving tools-k8s-worker-19 and tools-k8s-worker-18 to cloudvirt1022 (as part of draining 1014)
  • curprev 00:04, 20 February 2020imported>Stashbot 149,010 bytes +526 Krenair: Shut off tools-puppetmaster-01 - to be deleted in one week T245365
  • curprev 00:59, 19 February 2020imported>Stashbot 148,484 bytes +424 bd808: Live hacked the "nginx-configuration" ConfigMap for T245426 (done several hours ago, but I forgot to !log it)
  • curprev 18:53, 17 February 2020imported>Stashbot 148,060 bytes +286 arturo: T168677 created DNS TXT record _psl.toolforge.org. with value `https://github.com/publicsuffix/list/pull/970`
  • curprev 00:38, 14 February 2020imported>Stashbot 147,774 bytes +1,893 bd808: Added tools-k8s-worker-35 to 2020 Kubernetes cluster (T244791)
  • curprev 19:29, 12 February 2020imported>Stashbot 145,881 bytes +199 bd808: Rebuilding all Docker images to pick up toollabs-webservice (0.63) (T244954)
  • curprev 00:20, 12 February 2020imported>Stashbot 145,682 bytes +785 bd808: Depooling tools-sgewebgrid-generic-0903 (T244791)
  • curprev 23:39, 10 February 2020imported>Stashbot 144,897 bytes +537 bstorm_: updated tools-manifest to 0.21 on aptly for stretch
  • curprev 10:55, 7 February 2020imported>Stashbot 144,360 bytes +167 arturo: drop jessie VM instances tools-prometheus-{01,02} which were shutdown (T238096)
  • curprev 10:44, 6 February 2020imported>Stashbot 144,193 bytes +367 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/565556 which is a behavior change to the Toolforge front proxy (T234617)
  • curprev 11:22, 5 February 2020imported>Stashbot 143,826 bytes +155 arturo: restarting ferm fleet-wide to account for prometheus servers changed IP (but same hostname) (T238096)
  • curprev 11:38, 4 February 2020imported>Stashbot 143,671 bytes +258 arturo: start again tools-prometheus-01 again to sync data to the new tools-prometheus-03/04 VMs (T238096)
  • curprev 14:12, 3 February 2020imported>Stashbot 143,413 bytes +471 arturo: move tools-prometheus-04 from cloudvirt1022 to cloudvirt1013
  • curprev 14:06, 31 January 2020imported>Stashbot 142,942 bytes +411 arturo: leave tools-prometheus-01 as the backend for tools-prometheus.wmflabs.org for the weekend so grafana dashboards keep working (T238096)
  • curprev 21:04, 30 January 2020imported>Stashbot 142,531 bytes +1,613 andrewbogott: also apt-get install python3-novaclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/
  • curprev 20:07, 29 January 2020imported>Stashbot 140,918 bytes +174 bd808: Created {bastion,login,dev}.toolforge.org service names for Toolforge bastions using Horizon & Designate
  • curprev 13:35, 28 January 2020imported>Stashbot 140,744 bytes +289 arturo: `aborrero@tools-clushmaster-02:~$ clush -w @exec-stretch 'for i in $(ps aux | grep [t]ools.j | awk -F" " "{print \$2}") ; do echo "killing $i" ; sudo kill $i ; done || true'` (T243831)
  • curprev 07:05, 27 January 2020imported>Stashbot 140,455 bytes +329 zhuyifei1999_: wrong package. uninstalled. the correct one is bpfcc-tools and seems only available in buster+. T115231
  • curprev 20:58, 24 January 2020imported>Stashbot 140,126 bytes +457 bd808: Built tools-k8s-worker-21 to test out build script following openstack client upgrade
  • curprev 23:38, 23 January 2020imported>Stashbot 139,669 bytes +421 bd808: Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes
  • curprev 12:43, 22 January 2020imported>Stashbot 139,248 bytes +200 arturo: for the record, issue with tools-worker-1016 was memory exhaustion apparently
  • curprev 19:25, 21 January 2020imported>Stashbot 139,048 bytes +398 bstorm_: hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network
  • curprev 23:54, 16 January 2020imported>Stashbot 138,650 bytes +432 bstorm_: rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly
  • curprev 15:29, 14 January 2020imported>Stashbot 138,218 bytes +216 bstorm_: failed the gridengine master back to the master server from the shadow
  • curprev 17:48, 13 January 2020imported>Stashbot 138,002 bytes +557 bd808: Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642)
  • curprev 22:31, 12 January 2020imported>Stashbot 137,445 bytes +224 Krenair: same on -13 and -14
  • curprev 01:33, 11 January 2020imported>Stashbot 137,221 bytes +157 bstorm_: updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos.
  • curprev 23:31, 10 January 2020imported>Stashbot 137,064 bytes +462 bstorm_: updated toollabs-webservice package to 0.56
  • curprev 23:35, 9 January 2020imported>Stashbot 136,602 bytes +533 bstorm_: depooled tools-sgeexec-0939 because it isn't acting right and rebooting it
  • curprev 22:40, 7 January 2020imported>Stashbot 136,069 bytes +1,199 bstorm_: rebooted tools-worker-1007 to recover it from disk full and general badness
  • curprev 00:26, 7 January 2020imported>Stashbot 134,870 bytes +1,665 bstorm_: repooled tools-sgewebgrid-lighttpd-0919
  • curprev 18:11, 4 January 2020imported>Stashbot 133,205 bytes +1,777 bd808: Shutdown tools-worker-1029
  • curprev 16:48, 3 January 2020imported>Stashbot 131,428 bytes +586 bstorm_: updated the ValidatingWebhookConfiguration for the ingress admission controller to the working settings
  • curprev 00:11, 3 January 2020imported>Stashbot 130,842 bytes +191 bd808: Rebuiliding all stretch-ssd Docker images to pick up busybox
  • curprev 05:02, 30 December 2019imported>Stashbot 130,651 bytes +195 andrewbogott: moving tools-worker-1012 to cloudvirt1024 for T241523
  • curprev 01:38, 29 December 2019imported>Stashbot 130,456 bytes +215 Krenair: Cordoned tools-worker-1012 and deleted pods associated with dplbot and dewikigreetbot as well as my own testing one, host seems to be under heavy load - T241523
  • curprev 15:06, 27 December 2019imported>Stashbot 130,241 bytes +142 Krenair: Killed a "python parse_page.py outreachy" process by aikochou that was hogging IO on tools-sgebastion-07
  • curprev 16:07, 25 December 2019imported>Stashbot 130,099 bytes +134 zhuyifei1999_: pkilled 5 `python pwb.py` processes belonging to `tools.kaleem-bot` on tools-sgebastion-07
  • curprev 20:13, 22 December 2019imported>Stashbot 129,965 bytes +263 bd808: Enabled Puppet on tools-proxy-06.tools.eqiad.wmflabs after nginx config test (T241310)
  • curprev 22:28, 20 December 2019imported>Stashbot 129,702 bytes +211 bd808: Re-enabled Puppet on tools-sgebastion-09. Reason for disable was "arturo raising systemd limits"
  • curprev 17:33, 18 December 2019imported>Stashbot 129,491 bytes +310 bstorm_: updated package in aptly for toollabs-webservice to 0.53
  • curprev 20:25, 17 December 2019imported>Stashbot 129,181 bytes +950 bd808: Fixed https://tools.wmflabs.org/ to redirect to https://tools.wmflabs.org/admin/
  • curprev 00:45, 17 December 2019imported>Stashbot 128,231 bytes +295 bstorm_: enabled encryption at rest on the new k8s cluster
  • curprev 10:48, 14 December 2019imported>Stashbot 127,936 bytes +153 valhallasw`cloud: re-enabling puppet on tools-sgeexec-0912, likely left-over from NFS maintenance (no reason was specified).
  • curprev 18:46, 13 December 2019imported>Stashbot 127,783 bytes +316 bstorm_: updated tools-k8s-control-2 and 3 to the new config as well
  • curprev 00:45, 13 December 2019imported>Stashbot 127,467 bytes +806 bstorm_: rebooting tools-static-13
  • curprev 18:13, 11 December 2019imported>Stashbot 126,661 bytes +239 bd808: Restarted maintain-dbusers on labstore1004. Process had not logged any account creations since 2019-12-01T22:45:45.
  • curprev 13:59, 10 December 2019imported>Stashbot 126,422 bytes +108 arturo: set pod replicas to 3 in the new k8s cluster (T239405)
  • curprev 11:06, 9 December 2019imported>Stashbot 126,314 bytes +144 andrewbogott: deleting unused security groups: catgraph, devpi, MTA, mysql, syslog, test T91619
  • curprev 13:45, 4 December 2019imported>Stashbot 126,170 bytes +101 arturo: drop puppet prefix `tools-cron`, deprecated and no longer in use
  • curprev 11:45, 29 November 2019imported>Stashbot 126,069 bytes +1,940 arturo: created 3 new VMs `tools-k8s-worker-[3,4,5]` (T239403)
  • curprev 13:49, 19 November 2019imported>Stashbot 124,129 bytes +239 arturo: re-create nginx-ingress pod due to deployment template refresh (T237643)
  • curprev 14:44, 15 November 2019imported>Stashbot 123,890 bytes +100 arturo: stop live-hacks on tools-prometheus-01 T237643
  • curprev 17:20, 13 November 2019imported>Stashbot 123,790 bytes +154 arturo: live-hacking tools-prometheus-01 to test some experimental configs for the new k8s cluster (T237643)
  • curprev 12:52, 12 November 2019imported>Stashbot 123,636 bytes +107 arturo: reboot tools-proxy-06 to reset iptables setup T238058
  • curprev 02:17, 10 November 2019imported>Stashbot 123,529 bytes +520 bd808: Building new Docker images for T237836 (retrying after cleaning out old images on tools-docker-builder-06)
  • curprev 22:47, 8 November 2019imported>Stashbot 123,009 bytes +477 bstorm_: adding rsync::server::wrap_with_stunnel: false to the tools-docker-registry-03/4 servers to unbreak puppet
  • curprev 13:27, 7 November 2019imported>Stashbot 122,532 bytes +616 arturo: deployed registry-admission-webhook and ingress-admission-controller into the new k8s cluster (T236826)
  • curprev 22:32, 6 November 2019imported>Stashbot 121,916 bytes +804 bstorm_: added rsync::server::wrap_with_stunnel: false to tools-sge-services prefix to fix puppet
  • curprev 23:08, 5 November 2019imported>Stashbot 121,112 bytes +1,265 Krenair: Dropped 59a77a3, 3830802, and 83df61f from tools-puppetmaster-01:/var/lib/git/labs/private cherry-picks as these are no longer required T206235
  • curprev 14:45, 4 November 2019imported>Stashbot 119,847 bytes +503 phamhi: Built and pushed ruby25 docker image based on buster (T230961)
  • curprev 21:00, 1 November 2019imported>Stashbot 119,344 bytes +604 Krenair: Removed tools-checker.wmflabs.org A record to 208.80.155.229 as that target IP is in the old pre-neutron range that is no longer routed
  • curprev 18:47, 31 October 2019imported>Stashbot 118,740 bytes +783 andrewbogott: deleted and/or truncated a bunch of logfiles on tools-worker-1001. Runaway logfiles filled up the drive which prevented puppet from running. If puppet had run, it would have prevented the runaway logfiles.
  • curprev 13:53, 30 October 2019imported>Stashbot 117,957 bytes +464 arturo: replacing SSL cert in tools-proxy-x server apparently OK (merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/545679) T235252
  • curprev 10:49, 29 October 2019imported>Stashbot 117,493 bytes +170 arturo: deleting VMs tools-test-proxy-01, no longer in use
  • curprev 16:06, 28 October 2019imported>Stashbot 117,323 bytes +2,583 arturo: delete VM instance `tools-test-proxy-01` and the puppet prefix `tools-test-proxy`
  • curprev 16:32, 24 October 2019imported>Stashbot 114,740 bytes +103 bstorm_: set the prod rsyslog config for kubernetes to false for Toolforge
  • curprev 20:00, 23 October 2019imported>Stashbot 114,637 bytes +437 phamhi: Rebuilding all jessie and stretch docker images to pick up toollabs-webservice 0.47 (T233347)
  • curprev 16:56, 22 October 2019imported>Stashbot 114,200 bytes +177 bstorm_: drained tools-worker-1025.tools.eqiad.wmflabs which was malfunctioning
  • curprev 17:32, 21 October 2019imported>Stashbot 114,023 bytes +120 phamhi: Rebuilding all jessie and stretch docker images to pick up toollabs-webservice 0.46
  • curprev 22:15, 18 October 2019imported>Stashbot 113,903 bytes +353 bd808: Rescheduled continuous jobs away from tools-sgeexec-0904 because of high system load
  • curprev 16:21, 16 October 2019imported>Stashbot 113,550 bytes +390 phamhi: Deployed toollabs-webservice 0.46 to buster-tools and stretch-tools (T218461)
  • curprev 17:10, 15 October 2019imported>Stashbot 113,160 bytes +97 phamhi: restart tools-worker-1035 because it is no longer responding
  • curprev 09:26, 14 October 2019imported>Stashbot 113,063 bytes +116 arturo: cleaned-up updatetools from tools-sge-services nodes (T229261)
  • curprev 19:52, 11 October 2019imported>Stashbot 112,947 bytes +659 bstorm_: restarted docker on tools-docker-builder after phamhi noticed the daemon had a routing issue (blank iptables)
  • curprev 02:33, 10 October 2019imported>Stashbot 112,288 bytes +92 bd808: Rebooting tools-sgewebgrid-lighttpd-0903. Instance hung.
  • curprev 22:52, 9 October 2019imported>Stashbot 112,196 bytes +847 jeh: removing test instances tools-sssd-sgeexec-test-[12] from SGE
  • curprev 19:40, 8 October 2019imported>Stashbot 111,349 bytes +410 bstorm_: drained tools-worker-1007/8 to rebalance the cluster
  • curprev 20:17, 7 October 2019imported>Stashbot 110,939 bytes +4,103 bd808: Dropped backlog of messages for delivery to tools.usrd-tools
  • curprev 21:43, 4 October 2019imported>Stashbot 106,836 bytes +557 bd808: `sudo exec-manage repool tools-sgeexec-0923.tools.eqiad.wmflabs`
  • curprev 13:05, 3 October 2019imported>Stashbot 106,279 bytes +101 arturo: delete servers tools-sssd-sgeexec-test-[1,2], no longer required
  • curprev 16:59, 27 September 2019imported>Stashbot 106,178 bytes +103 bd808: Set "profile::rsyslog::kafka_shipper::kafka_brokers: []" in tools-elastic prefix puppet
  • curprev 00:40, 27 September 2019imported>Stashbot 106,075 bytes +90 bstorm_: depooled and rebooted tools-sgewebgrid-lighttpd-0927
  • curprev 19:08, 25 September 2019imported>Stashbot 105,985 bytes +97 andrewbogott: moving tools-sgewebgrid-lighttpd-0903 to cloudvirt1021
  • curprev 16:58, 23 September 2019imported>Stashbot 105,888 bytes +192 bstorm_: deployed tools-manifest 0.20 and restarted webservicemonitor
  • curprev 20:48, 12 September 2019imported>Stashbot 105,696 bytes +95 phamhi: Deleted tools-puppetdb-01.tools as it is no longer in used
  • curprev 13:31, 11 September 2019imported>Stashbot 105,601 bytes +60 jeh: restart tools-sgeexec-0912
  • curprev 22:44, 9 September 2019imported>Stashbot 105,541 bytes +88 bstorm_: uncordoned tools-worker-1030 and tools-worker-1038
  • curprev 15:11, 6 September 2019imported>Stashbot 105,453 bytes +106 bd808: `sudo kill -9 10635` on tools-k8s-master-01 (T194859)
  • curprev 21:02, 5 September 2019imported>Stashbot 105,347 bytes +242 bd808: Enabled Puppet on tools-docker-registry-03 and forced puppet run (T232135)
  • curprev 20:51, 1 September 2019imported>Stashbot 105,105 bytes +100 Reedy: `sudo service maintain-kubeusers restart` on tools-k8s-master-01
  • curprev 16:54, 30 August 2019imported>Stashbot 105,005 bytes +201 phamhi: restart maintain-kuberusers service in tools-k8s-master-01
  • curprev 22:18, 29 August 2019imported>Stashbot 104,804 bytes +357 bd808: Finished building new stretch Docker images for Toolforge Kubernetes use
  • curprev 19:10, 27 August 2019imported>Stashbot 104,447 bytes +116 bd808: Restarted maintain-kubeusers after complaint on irc. It was stuck in limbo again
  • curprev 21:48, 26 August 2019imported>Stashbot 104,331 bytes +163 bstorm_: repooled tools-sgewebgrid-generic-0902, tools-sgewebgrid-lighttpd-0902, tools-sgewebgrid-lighttpd-0903 and tools-sgeexec-0905
  • curprev 08:11, 18 August 2019imported>Stashbot 104,168 bytes +95 arturo: restart maintain-kuberusers service in tools-k8s-master-01
  • curprev 10:56, 17 August 2019imported>Stashbot 104,073 bytes +88 arturo: force-reboot tools-worker-1006. Is completely stuck
  • curprev 15:32, 15 August 2019imported>Stashbot 103,985 bytes +222 jeh: upgraded jobutils debian package to 1.38 T229551
  • curprev 22:00, 13 August 2019imported>Stashbot 103,763 bytes +200 bstorm_: truncated exim paniclog on tools-sgecron-01 because it was being spammy
  • curprev 16:08, 12 August 2019imported>Stashbot 103,563 bytes +171 phamhi: updated prometheus-node-exporter from 0.14.0~git20170523-1 to 0.17.0+ds-3 in tools-worker-[1030-1040] nodes (T230147)
  • curprev 19:26, 8 August 2019imported>Stashbot 103,392 bytes +100 jeh: restarting tools-sgewebgrid-lighttpd-0915 T230157
  • curprev 19:07, 7 August 2019imported>Stashbot 103,292 bytes +121 bd808: Disassociated SUL and Phabricator accounts from user Lophi (T229713)
  • curprev 16:18, 6 August 2019imported>Stashbot 103,171 bytes +290 arturo: add phamhi as user/projectadmin (T228942) and delete hpham
  • curprev 22:49, 5 August 2019imported>Stashbot 102,881 bytes +411 bstorm_: launching tools-worker-1040
  • curprev 14:00, 2 August 2019imported>Stashbot 102,470 bytes +93 andrewbogott_: rebooting tools-worker-1022 as it is unresponsive
  • curprev 18:07, 31 July 2019imported>Stashbot 102,377 bytes +641 bstorm_: drained tools-worker-1015/05/03/17 to rebalance load
  • curprev 23:00, 27 July 2019imported>Stashbot 101,736 bytes +247 zhuyifei1999_: a past probably related ticket: T194859
  • curprev 17:39, 26 July 2019imported>Stashbot 101,489 bytes +492 bstorm_: restarted maintain-kubeusers because it was suspiciously tardy and quiet
  • curprev 22:01, 25 July 2019imported>Stashbot 100,997 bytes +142 bstorm_: T228573 created tools-worker-1030
  • curprev 10:14, 24 July 2019imported>Stashbot 100,855 bytes +251 arturo: reallocating tools-puppetmaster-01 from cloudvirt1027 to cloudvirt1028 (T227539)
  • curprev 18:39, 22 July 2019imported>Stashbot 100,604 bytes +577 bstorm_: repooled tools-sgeexec-0905 after reboot
  • curprev 19:52, 20 July 2019imported>Stashbot 100,027 bytes +70 andrewbogott: rebooting tools-worker-1023
  • curprev 20:23, 17 July 2019imported>Stashbot 99,957 bytes +90 andrewbogott: migrating tools-sgegrid-shadow to cloudvirt1014
  • curprev 14:50, 15 July 2019imported>Stashbot 99,867 bytes +140 bstorm_: cleared error state from tools-sgeexec-0911 which went offline after error from job 5190035
  • curprev 09:30, 25 June 2019imported>Stashbot 99,727 bytes +95 arturo: detected puppet issue in all VMs: T226480
  • curprev 17:42, 24 June 2019imported>Stashbot 99,632 bytes +85 andrewbogott: moving tools-sgeexec-0905 to cloudvirt1015
  • curprev 14:07, 17 June 2019imported>Stashbot 99,547 bytes +251 andrewbogott: moving tools-sgewebgrid-lighttpd-0903 to cloudvirt1015
  • curprev 18:03, 11 June 2019imported>Stashbot 99,296 bytes +103 bstorm_: deleted anomalous kubernetes node tools-worker-1019.eqiad.wmflabs
  • curprev 18:33, 5 June 2019imported>Stashbot 99,193 bytes +179 andrewbogott: repooled tools-sgeexec-0921 and tools-sgeexec-0929
  • curprev 13:01, 30 May 2019imported>Stashbot 99,014 bytes +1,951 arturo: uncordon/repool tools-worker-1001/2/3. They should be fine now. I'm only leaving 1029 cordoned for testing purposes
  • curprev 11:13, 29 May 2019imported>Stashbot 97,063 bytes +402 arturo: briefly tested some sssd config changes in tools-sgebastion-09
  • curprev 18:15, 28 May 2019imported>Stashbot 96,661 bytes +1,669 arturo: T221225 for the record, tools-worker-1001 is not working after trying with sssd
  • curprev 09:47, 27 May 2019imported>Stashbot 94,992 bytes +247 arturo: run `apt-get clean` to wipe 4GB of unused .deb packages, usage on / (root) was > 90% (on tools-sgebastion-08)
  • curprev 12:35, 21 May 2019imported>Stashbot 94,745 bytes +88 arturo: T223992 rebooting tools-redis-1002
  • curprev 11:25, 20 May 2019imported>Stashbot 94,657 bytes +271 arturo: T223332 enable puppet agent in tools-k8s-master and tools-docker-registry nodes and deploy new SSL cert
  • curprev 11:13, 18 May 2019imported>Stashbot 94,386 bytes +402 chicocvenancio: PAWS update helm chart to point to new singleuser image (T217908)
  • curprev 11:22, 16 May 2019imported>Stashbot 93,984 bytes +184 chicocvenancio: PAWS: restart hub to get new configured announcement
  • curprev 16:20, 15 May 2019imported>Stashbot 93,800 bytes +1,037 arturo: T223148 repool both tools-sgeexec-0921 and -0929
  • curprev 17:12, 14 May 2019imported>Stashbot 92,763 bytes +1,393 arturo: T223148 repool tools-sgeexec-0920
  • curprev 08:15, 13 May 2019imported>Stashbot 91,370 bytes +176 zhuyifei1999_: `truncate -s 0 /var/log/exim4/paniclog` on tools-sgecron-01.tools.eqiad.wmflabs & tools-sgewebgrid-lighttpd-0921.tools.eqiad.wmflabs
  • curprev 14:38, 7 May 2019imported>Stashbot 91,194 bytes +1,080 arturo: T222718 uncordon tools-worker-1019, I couldn't find a reason for it to be cordoned
  • curprev 11:34, 6 May 2019imported>Stashbot 90,114 bytes +205 arturo: T221225 reenable puppet
  • curprev 09:43, 3 May 2019imported>Stashbot 89,909 bytes +741 arturo: fixed puppet in tools-puppetdb-01 too
  • curprev 12:50, 30 April 2019imported>Stashbot 89,168 bytes +455 arturo: enable puppet in all servers T221225
  • curprev 11:22, 29 April 2019imported>Stashbot 88,713 bytes +406 arturo: T221225 re-enable puppet agent in all toolforge servers
  • curprev 12:20, 26 April 2019imported>Stashbot 88,307 bytes +161 andrewbogott: rescheduling every pod everywhere
  • curprev 12:49, 25 April 2019imported>Stashbot 88,146 bytes +296 arturo: T221225 using `profile::ldap::client::labs::client_stack: sssd` in horizon for tools-sgebastion-09 (testing)
  • curprev 12:54, 24 April 2019imported>Stashbot 87,850 bytes +159 arturo: puppet broken, fixing right now
  • curprev 15:26, 23 April 2019imported>Stashbot 87,691 bytes +1,148 arturo: T221225 rebooting tools-sgebastion-08 to cleanup sssd
  • curprev 12:09, 17 April 2019imported>Stashbot 86,543 bytes +1,187 arturo: T221225 rebooting bastions to clean sssd. We are back to nscd/nslcd until we figure out what's wrong here
  • curprev 20:49, 16 April 2019imported>Stashbot 85,356 bytes +257 chicocvenancio: change paws announcement in configmap hub-config back to a welcome message
  • curprev 18:50, 15 April 2019imported>Stashbot 85,099 bytes +167 andrewbogott: moving tools-elastic-01 to cloudvirt1008 to make spreadcheck happy
  • curprev 16:23, 14 April 2019imported>Stashbot 84,932 bytes +112 andrewbogott: moved all tools-worker nodes off of cloudvirt1015 and uncordoned them
  • curprev 21:09, 13 April 2019imported>Stashbot 84,820 bytes +433 bstorm_: Moving tools-prometheus-01 to cloudvirt1009 and tools-clushmaster-02 to cloudvirt1008 for T220853
  • curprev 22:38, 11 April 2019imported>Stashbot 84,387 bytes +777 andrewbogott: moving tools-paws-worker-1005 to cloudvirt1009 to make spreadcheck happier
  • curprev 00:03, 11 April 2019imported>Stashbot 83,610 bytes +1,993 andrewbogott: tools-paws-worker-1002, tools-paws-worker-1003 to eqiad1-r
  • curprev 00:32, 10 April 2019imported>Stashbot 81,617 bytes +1,255 andrewbogott: migrating tools-worker-1022, 1023, 1025, 1026 to eqiad1-r
  • curprev 22:36, 8 April 2019imported>Stashbot 80,362 bytes +182 andrewbogott: moving tools-worker-1006 and tools-worker-1007 to eqiad1-r
  • curprev 16:54, 7 April 2019imported>Stashbot 80,180 bytes +218 zhuyifei1999_: tools-sgeexec-0928 unresponsive since around 22 UTC. No data on Graphite. Can't ssh in even as root. Hard rebooting via Horizon
  • curprev 15:44, 5 April 2019imported>Stashbot 79,962 bytes +74 bstorm_: cleared E state from two exec queues
  • curprev 21:21, 4 April 2019imported>Stashbot 79,888 bytes +1,354 bd808: Uncordoned tools-worker-1013.tools.eqiad.wmflabs after reboot and forced puppet run
  • curprev 11:22, 3 April 2019imported>Stashbot 78,534 bytes +138 arturo: puppet breakage in due to me introducing openstack-mitaka-jessie repo by mistake. Cleaning up already
  • curprev 12:11, 2 April 2019imported>Stashbot 78,396 bytes +189 arturo: icinga downtime toolschecker for 1 month T219243
  • curprev 19:44, 1 April 2019imported>Stashbot 78,207 bytes +313 bd808: Deleted tools-checker-02 via Horizon (T219243)
  • curprev 21:13, 29 March 2019imported>Stashbot 77,894 bytes +1,362 bstorm_: depooled tools-sgewebgrid-generic-0903 because of some stuck jobs and odd load characteristics
  • curprev 01:00, 28 March 2019imported>Stashbot 76,532 bytes +589 bstorm_: cleared error states from two queues
  • curprev 22:00, 26 March 2019imported>Stashbot 75,943 bytes +131 gtirloni: downtimed toolschecker
  • curprev 00:27, 26 March 2019imported>Stashbot 75,812 bytes +2,648 bd808: Deleted DNS record for login-trusty.tools.wmflabs.org
  • curprev 17:16, 22 March 2019imported>Stashbot 73,164 bytes +615 andrewbogott: switching all instances to use ldap-ro.eqiad.wikimedia.org as both primary and secondary ldap server
  • curprev 00:39, 22 March 2019imported>Stashbot 72,549 bytes +620 bstorm_: T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0902
  • curprev 18:43, 18 March 2019imported>Stashbot 71,929 bytes +472 bd808: Rebooting tools-static-12
  • curprev 23:41, 17 March 2019imported>Stashbot 71,457 bytes +586 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497210/ as a quick fix for T218494
  • curprev 22:34, 16 March 2019imported>Stashbot 70,871 bytes +71 bstorm_: clearing errored out queues again
  • curprev 21:08, 15 March 2019imported>Stashbot 70,800 bytes +373 bstorm_: cleared error state on several queues T217280
  • curprev 23:52, 14 March 2019imported>Stashbot 70,427 bytes +2,052 bd808: Disabled job queues and rescheduled continuous jobs away from tools-exec-14{21,22,23,24,25,26,27,28,29,30,31,32} (T217152)
  • curprev 23:30, 13 March 2019imported>Stashbot 68,375 bytes +775 bd808: Rebuilding stretch Kubernetes images
  • curprev 00:22, 13 March 2019imported>Stashbot 67,600 bytes +113 bd808: Raise web-memlimit for isbn tool to 6G for tomcat8 (T217406)
  • curprev 15:53, 11 March 2019imported>Stashbot 67,487 bytes +344 bd808: Manually started `service gridengine-master` on tools-sgegrid-master after reboot (T218038)
  • curprev 00:53, 11 March 2019imported>Stashbot 67,143 bytes +562 bd808: Re-enabled 13 queue instances that had been disabled by LDAP failures during job initialization (T217280)
  • curprev 00:30, 8 March 2019imported>Stashbot 66,581 bytes +418 bd808: DNS record created for trusty-dev.tools.wmflabs.org (Trusty secondary bastion)
  • curprev 00:49, 7 March 2019imported>Stashbot 66,163 bytes +380 zhuyifei1999_: clushed misctools 1.37 upgrade on @bastion,@cron,@bastion-stretch T217406
  • curprev 19:07, 4 March 2019imported>Stashbot 65,783 bytes +276 bstorm_: umounted /mnt/nfs/dumps-labstore1006.wikimedia.org for T217473
  • curprev 20:54, 3 March 2019imported>Stashbot 65,507 bytes +79 andrewbogott: cleaning out /tmp on tools-exec-1412
  • curprev 19:36, 28 February 2019imported>Stashbot 65,428 bytes +234 zhuyifei1999_: built with debuild instead T217297
  • curprev 20:41, 27 February 2019imported>Stashbot 65,194 bytes +734 andrewbogott: restarting nginx on tools-checker-01
  • curprev 20:51, 26 February 2019imported>Stashbot 64,460 bytes +223 gtirloni: reboot tools-package-builder-02 (unresponsive)
  • curprev 23:20, 25 February 2019imported>Stashbot 64,237 bytes +1,248 bstorm_: Depooled tools-sgeexec-0914 and tools-sgeexec-0915 for T217066
  • curprev 16:29, 22 February 2019imported>Stashbot 62,989 bytes +213 gtirloni: upgraded and rebooted tools-puppetmaster-01 (new kernel)
  • curprev 09:59, 21 February 2019imported>Stashbot 62,776 bytes +61 gtirloni: upgraded all packages in all stretch nodes
  • curprev 00:12, 21 February 2019imported>Stashbot 62,715 bytes +1,098 zhuyifei1999_: forcing puppet run on tools-k8s-master-01
  • curprev 01:49, 19 February 2019imported>Stashbot 61,617 bytes +118 bd808: Revoked Toolforge project membership for user DannyS712 (T215092)
  • curprev 20:45, 18 February 2019imported>Stashbot 61,499 bytes +370 gtirloni: upgraded and rebooted tools-sgebastion-07 (login-stretch)
  • curprev 22:23, 17 February 2019imported>Stashbot 61,129 bytes +321 zhuyifei1999_: uncordon tools-worker-1010.tools.eqiad.wmflabs
  • curprev 05:00, 16 February 2019imported>Stashbot 60,808 bytes +1,745 zhuyifei1999_: fixed by restarting flannel. another puppet run simply started kubelet
  • curprev 21:57, 14 February 2019imported>Stashbot 59,063 bytes +1,078 bd808: Deleted old tools-proxy-02 instance
  • curprev 19:16, 13 February 2019imported>Stashbot 57,985 bytes +680 andrewbogott: deleting tools-sgewebgrid-generic-0901, tools-sgewebgrid-lighttpd-0901, tools-sgebastion-06
  • curprev 01:24, 12 February 2019imported>Stashbot 57,305 bytes +153 bd808: Stopped maintain-kubeusers, edited /etc/kubernetes/tokenauth, restarted maintain-kubeusers (T215704)
  • curprev 22:57, 11 February 2019imported>Stashbot 57,152 bytes +1,621 bd808: Shutoff tools-webgrid-lighttpd-14{01,13,24,26,27,28} via Horizon UI
  • curprev 19:17, 8 February 2019imported>Stashbot 55,531 bytes +434 hauskatze: Stopped webservice of `tools.sulinfo` which redirects to `tools.quentinv57-tools` which is also unavalaible
  • curprev 01:07, 8 February 2019imported>Stashbot 55,097 bytes +351 bd808: Creating tools-sgebastion-07
  • curprev 13:20, 4 February 2019imported>Stashbot 54,746 bytes +395 arturo: T215154 another reboot for tools-sgebastion-06
  • curprev 23:54, 30 January 2019imported>Stashbot 54,351 bytes +70 gtirloni: cleared apt cache on sge* hosts
  • curprev 20:50, 25 January 2019imported>Stashbot 54,281 bytes +336 bd808: Deployed new tcl/web Kubernetes image based on Debian Stretch (T214668)
  • curprev 11:09, 24 January 2019imported>Stashbot 53,945 bytes +341 arturo: T213421 delete tools-services-01/02
  • curprev 22:18, 23 January 2019imported>Stashbot 53,604 bytes +679 bd808: Building new tools-sgewebgrid-lighttpd-0904 instance using Stretch base image (T214519)
  • curprev 20:21, 22 January 2019imported>Stashbot 52,925 bytes +326 gtirloni: published new docker images (all)
  • curprev 21:22, 18 January 2019imported>Stashbot 52,599 bytes +102 bd808: Forcing php-igbinary update via clush for T213666
  • curprev 23:37, 17 January 2019imported>Stashbot 52,497 bytes +574 bd808: Shutdown tools-package-builder-01. Use tools-package-builder-02 instead!
  • curprev 17:29, 16 January 2019imported>Stashbot 51,923 bytes +476 andrewbogott: depooling and moving tools-sgeexec-0904 tools-sgeexec-0906 tools-sgewebgrid-lighttpd-0904
  • curprev 21:02, 15 January 2019imported>Stashbot 51,447 bytes −178,393 bstorm_: restarting webservicemonitor on tools-services-02 -- acting funny
  • curprev 11:55, 11 January 2019imported>Stashbot 229,840 bytes +296 arturo: T213418 shutdown tools-docker-builder-05, will give a grace period before deleting the VM
  • curprev 22:45, 10 January 2019imported>Stashbot 229,544 bytes +292 bstorm_: T213357 - Added 24 lighttpd nodes tot he new grid
  • curprev 00:12, 10 January 2019imported>Stashbot 229,252 bytes +432 bstorm_: T213353 Added 36 exec nodes to the new grid
  • curprev 17:21, 7 January 2019imported>Stashbot 228,820 bytes +325 bstorm_: T67777 - set the max_u_jobs global grid config setting to 50 in the new grid
  • curprev 22:06, 6 January 2019imported>Stashbot 228,495 bytes +103 bd808: Added floating ip to tools-sgebastion-06 (T212360)
  • curprev 23:54, 5 January 2019imported>Stashbot 228,392 bytes +173 bd808: Manually installed php-mbstring on tools-sgebastion-06. Gerrit patch submitted to install it on the rest of the Son of Grid Engine nodes.
  • curprev 21:37, 4 January 2019imported>Stashbot 228,219 bytes +114 bd808: Truncated /data/project/.system/accounting after archiving ~30 days of history
  • curprev 21:03, 3 January 2019imported>Stashbot 228,105 bytes +214 bd808: Enabled Puppet on tools-proxy-02
  • curprev 16:29, 21 December 2018imported>Stashbot 227,891 bytes +126 andrewbogott: migrating tools-exec-1416 to labvirt1004
  • curprev 00:35, 21 December 2018imported>Stashbot 227,765 bytes +615 bd808: Installed tools-manifest 0.14 for T212390
  • curprev 22:16, 17 December 2018imported>Stashbot 227,150 bytes +478 bstorm_: Adding a bunch of hiera values and prefixes for the new grid - T212153
  • curprev 13:19, 11 December 2018imported>Stashbot 226,672 bytes +84 gtirloni: Removed BigBrother (T208357)
  • curprev 12:17, 5 December 2018imported>Stashbot 226,588 bytes +129 gtirloni: remoted node tools-worker-1029.tools.eqiad.wmflabs from cluster (T196973)
  • curprev 22:47, 4 December 2018imported>Stashbot 226,459 bytes +262 bstorm_: gtirloni added back main floating IP for tools-k8s-master-01 and removed unnecessary ones to stop k8s outage T164123
  • curprev 02:44, 1 December 2018imported>Stashbot 226,197 bytes +88 gtirloni: deleted instance tools-exec-gift-trusty-01 (T194615)
  • curprev 00:10, 1 December 2018imported>Stashbot 226,109 bytes +402 andrewbogott: moving tools-worker-1020 and tools-worker-1022 to different labvirts
  • curprev 17:49, 27 November 2018imported>Stashbot 225,707 bytes +121 bstorm_: restarted maintain-kubeusers just in case it had any issues reconnecting to toolsdb
  • curprev 17:39, 26 November 2018imported>Stashbot 225,586 bytes +348 gtirloni: updated tools-manifest package on tools-services-01/02 to version 0.12 (10->60 seconds sleep time) (T210190)
  • curprev 23:05, 20 November 2018imported>Stashbot 225,238 bytes +451 gtirloni: Published stretch-tools and stretch-toolsbeta aptly repositories individually on tools-services-01
  • curprev 21:16, 16 November 2018imported>Stashbot 224,787 bytes +435 bd808: Ran grid engine orphan process kill script from T153281. Only 3 orphan php-cgi processes belonging to iluvatarbot found.
  • curprev 17:29, 14 November 2018imported>Stashbot 224,352 bytes +214 andrewbogott: moving tools-worker-1027 to labvirt1008
  • curprev 17:40, 13 November 2018imported>Stashbot 224,138 bytes +717 arturo: remove misctools 1.31 and jobutils 1.30 from the stretch-tools repo (T207970)
  • curprev 18:12, 8 November 2018imported>Stashbot 223,421 bytes +861 gtirloni: cleaned up old tmp files on tools-bastion-02
  • curprev 10:37, 7 November 2018imported>Stashbot 222,560 bytes +112 gtirloni: removed invalid apt.conf.d file from all hosts (T110055)
  • curprev 18:11, 2 November 2018imported>Stashbot 222,448 bytes +174 arturo: T206223 some disturbances due to the certificate renewal
  • curprev 18:02, 31 October 2018imported>Stashbot 222,274 bytes +163 gtirloni: truncated big .err and error.log files
  • curprev 17:00, 29 October 2018imported>Stashbot 222,111 bytes +108 bd808: Ran grid engine orphan process kill script from T153281
  • curprev 10:34, 26 October 2018imported>Stashbot 222,003 bytes +236 arturo: T207970 added misctools 1.31 and jobutils 1.30 to stretch-tools aptly repo
  • curprev 14:17, 19 October 2018imported>Stashbot 221,767 bytes +65 andrewbogott: moving tools-clushmaster-01 to labvirt1004
  • curprev 00:29, 19 October 2018imported>Stashbot 221,702 bytes +321 andrewbogott: migrating tools-exec-1411 and tools-exec-1410 off of cloudvirt1017
  • curprev 15:13, 16 October 2018imported>Stashbot 221,381 bytes +205 bd808: (repost for gtirloni) T186571 removed legofan4000 user from project-tools group (leftover from T165624 legofan4000->macfan4000 rename)
  • curprev 21:57, 7 October 2018imported>Stashbot 221,176 bytes +380 zhuyifei1999_: restarted maintain-kubeusers on tools-k8s-master-01 T194859
  • curprev 12:35, 21 September 2018imported>Stashbot 220,796 bytes +431 arturo: cleanup stalled apt preference files (pinning) in tools-clushmaster-01
  • curprev 09:13, 17 September 2018imported>Stashbot 220,365 bytes +128 arturo: T204481 aborrero@tools-mail:~$ sudo exiqgrep -i | xargs sudo exim -Mrm
  • curprev 11:22, 14 September 2018imported>Stashbot 220,237 bytes +246 arturo: T204267 stop the corhist tool (k8s) because is hammering the wikidata API
  • curprev 10:35, 8 September 2018imported>Stashbot 219,991 bytes +118 gtirloni: restarted cron and truncated /var/log/exim4/paniclog (T196137)
  • curprev 05:07, 7 September 2018imported>Stashbot 219,873 bytes +88 legoktm: uploaded/imported toollabs-webservice_0.42_all.deb
  • curprev 23:40, 27 August 2018imported>Stashbot 219,785 bytes +320 bd808: `# exec-manage repool tools-webgrid-generic-1402.eqiad.wmflabs` T202932
  • curprev 13:02, 22 August 2018imported>Stashbot 219,465 bytes +236 arturo: I used this command: `sudo exim -bp | sudo exiqgrep -i | xargs sudo exim -Mrm`
  • curprev 09:12, 19 August 2018imported>Stashbot 219,229 bytes +140 legoktm: rebuilding python/base k8s images for https://gerrit.wikimedia.org/r/453665 (T202218)
  • curprev 21:02, 14 August 2018imported>Stashbot 219,089 bytes +182 legoktm: rebuilt php7.2 docker images for https://gerrit.wikimedia.org/r/452755
  • curprev 23:31, 13 August 2018imported>Stashbot 218,907 bytes +234 legoktm: rebuilding docker images for webservice upgrade
  • curprev 10:40, 9 August 2018imported>Stashbot 218,673 bytes +293 arturo: T201602 upgrade packages from jessie-backports (excluding python-designateclient)
  • curprev 10:01, 8 August 2018imported>Stashbot 218,380 bytes +192 zhuyifei1999_: building & publishing toollabs-webservice 0.40 deb, and all Docker images T156626 T148872 T158244
  • curprev 12:33, 6 August 2018imported>Stashbot 218,188 bytes +98 arturo: T197176 installing texlive-full in toolforge
  • curprev 14:31, 1 August 2018imported>Stashbot 218,090 bytes +145 andrewbogott: temporarily depooling tools-exec-1409, 1410, 1414, 1419, 1427, 1428 to try to give labvirt1009 a break
  • curprev 20:33, 30 July 2018imported>Stashbot 217,945 bytes +186 bd808: Started rebuilding all Kubernetes Docker images to pick up latest apt updates
  • curprev 04:52, 27 July 2018imported>Stashbot 217,759 bytes +108 zhuyifei1999_: rebuilding python/base docker container T190274
  • curprev 19:02, 25 July 2018imported>Stashbot 217,651 bytes +175 chasemp: tools-worker-1004 reboot
  • curprev 13:24, 18 July 2018imported>Stashbot 217,476 bytes +506 arturo: upgrading packages from `stretch-wikimedia` T199905
  • curprev 18:15, 30 June 2018imported>Stashbot 216,970 bytes +486 chicocvenancio: pushed new config to PAWS to fix dumps nfs mountpoint
  • curprev 17:41, 29 June 2018imported>Stashbot 216,484 bytes +431 bd808: Rescheduling continuous jobs away from tools-exec-1408 where load is high
  • curprev 19:50, 28 June 2018imported>Stashbot 216,053 bytes +640 chasemp: tools-clushmaster-01:~$ clush -w @all 'sudo umount -fl /mnt/nfs/dumps-labstore1006.wikimedia.org'
  • curprev 13:18, 21 June 2018imported>Stashbot 215,413 bytes +109 chasemp: tools-bastion-03:~# bash -x /data/project/paws/paws-userhomes-hack.bash
  • curprev 15:09, 20 June 2018imported>Stashbot 215,304 bytes +138 bd808: Killed orphan processes on webgrid nodes (T182070); most owned by jembot and croptool
  • curprev 14:20, 14 June 2018imported>Stashbot 215,166 bytes +102 chasemp: timeout 180s bash -x /data/project/paws/paws-userhomes-hack.bash
(newest | oldest) View (newer 500 | ) (20 | 50 | 100 | 250 | 500)