You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Nova Resource:Tools/SAL: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View ( | older 500) (20 | 50 | 100 | 250 | 500)

23 September 2020

  • curprev 21:3821:38, 23 September 2020imported>Stashbot 182,914 bytes +111 bstorm: ran an 'apt clean' across the fleet to get ahead of the new locale install

18 September 2020

  • curprev 19:4119:41, 18 September 2020imported>Stashbot 182,803 bytes +1,384 andrewbogott: repooling tools-k8s-worker-30, 33, 34, 57, 60
  • curprev 01:0001:00, 18 September 2020imported>Stashbot 181,419 bytes +1,961 andrewbogott: depooling tools-sgeexec-0917, tools-sgeexec-0918, tools-sgeexec-0919, tools-sgeexec-0920 for flavor update

16 September 2020

  • curprev 23:2023:20, 16 September 2020imported>Stashbot 179,458 bytes +512 andrewbogott: repooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph

10 September 2020

9 September 2020

  • curprev 11:1211:12, 9 September 2020imported>Stashbot 178,587 bytes +560 arturo: new ingress nodes added to the cluster, and tainted/labeled per the docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying#ingress_nodes (T250172)

8 September 2020

2 September 2020

31 August 2020

30 August 2020

26 August 2020

  • curprev 21:0821:08, 26 August 2020imported>Stashbot 176,574 bytes +293 bd808: Disabled puppet on tools-proxy-06 to test fixes for a bug in the new T251628 code

25 August 2020

  • curprev 19:3819:38, 25 August 2020imported>Stashbot 176,281 bytes +648 andrewbogott: deleting tools-sgeexec-0943.tools.eqiad.wmflabs, tools-sgeexec-0944.tools.eqiad.wmflabs, tools-sgeexec-0945.tools.eqiad.wmflabs, tools-sgeexec-0946.tools.eqiad.wmflabs, tools-sgeexec-0948.tools.eqiad.wmflabs, tools-sgeexec-0949.tools.eqiad.wmflabs, tools-sgeexec-0953.tools.eqiad.wmflabs — they are broken and we're not very curious why; will retry this exercise when everything is standardized on

19 August 2020

  • curprev 21:2921:29, 19 August 2020imported>Stashbot 175,633 bytes +440 andrewbogott: shutting down and removing tools-k8s-worker-20 through tools-k8s-worker-29; this load can now be handled by new nodes on ceph hosts

18 August 2020

  • curprev 15:2415:24, 18 August 2020imported>Stashbot 175,193 bytes +117 bd808: Rebuilding all Docker containers to pick up newest versions of installed packages

30 July 2020

  • curprev 16:2816:28, 30 July 2020imported>Stashbot 175,076 bytes +152 andrewbogott: added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663

29 July 2020

  • curprev 23:2423:24, 29 July 2020imported>Stashbot 174,924 bytes +216 bd808: Pushed a copy of docker-registry.wikimedia.org/wikimedia-jessie:latest to docker-registry.tools.wmflabs.org/wikimedia-jessie:latest in preparation for the upstream image going away

24 July 2020

  • curprev 22:3322:33, 24 July 2020imported>Stashbot 174,708 bytes +426 bd808: Removed a few more ancient docker images: grrrit, jessie-toollabs, and nagf

22 July 2020

  • curprev 23:2423:24, 22 July 2020imported>Stashbot 174,282 bytes +1,162 bstorm: created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663

21 July 2020

  • curprev 16:0916:09, 21 July 2020imported>Stashbot 173,120 bytes +212 bstorm: rebooting tools-sgegrid-shadow to remount NFS correctly

17 July 2020

  • curprev 16:4716:47, 17 July 2020imported>Stashbot 172,908 bytes +235 bd808: Enabled Puppet on tools-proxy-06 following successful test (T102367)

15 July 2020

  • curprev 23:1123:11, 15 July 2020imported>Stashbot 172,673 bytes +117 bd808: Removed ssh root key for valhallasw from project hiera (T255697)

9 July 2020

  • curprev 18:5318:53, 9 July 2020imported>Stashbot 172,556 bytes +115 bd808: Updating git-review to 1.27 via clush across cluster (T257496)

8 July 2020

  • curprev 11:1611:16, 8 July 2020imported>Stashbot 172,441 bytes +299 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/610029 -- important change to front-proxy (T234617)

7 July 2020

  • curprev 23:2223:22, 7 July 2020imported>Stashbot 172,142 bytes +655 bd808: Rebuilding all Docker images to pick up webservice v0.73 (T234617, T257229)

6 July 2020

  • curprev 11:5411:54, 6 July 2020imported>Stashbot 171,487 bytes +354 arturo: briefly point DNS tools.wmflabs.org A record to 185.15.56.60 (tools-legacy-redirector) and then switch back to 185.15.56.11 (tools-proxy-05). The legacy redirector does HTTP/307 (T247236)

1 July 2020

  • curprev 11:1911:19, 1 July 2020imported>Stashbot 171,133 bytes +215 arturo: cleanup exim email queue (4 frozen messages)

30 June 2020

  • curprev 11:1811:18, 30 June 2020imported>Stashbot 170,918 bytes +123 arturo: set some hiera keys for mtail in puppet prefix `tools-mail` (T256737)

29 June 2020

25 June 2020

  • curprev 21:5021:50, 25 June 2020imported>Stashbot 170,486 bytes +283 zhuyifei1999_: re-enabling puppet on tools-sgebastion-09 T256426

24 June 2020

  • curprev 12:3612:36, 24 June 2020imported>Stashbot 170,203 bytes +252 arturo: live-hacking puppetmaster with exim prometheus stuff (T175964)

23 June 2020

  • curprev 17:5517:55, 23 June 2020imported>Stashbot 169,951 bytes +237 arturo: killed procs for users `hamishz` and `msyn` which apparently were tools that should be running in the grid / kubernetes instead

17 June 2020

  • curprev 10:4010:40, 17 June 2020imported>Stashbot 169,714 bytes +162 arturo: created VM tools-legacy-redirector, with the corresponding puppet prefix (T247236, T234617)

16 June 2020

  • curprev 23:0123:01, 16 June 2020imported>Stashbot 169,552 bytes +357 bd808: Building new Docker images to pick up webservice 0.72

15 June 2020

  • curprev 21:2821:28, 15 June 2020imported>Stashbot 169,195 bytes +347 bstorm_: cleaned up killgridjobs.sh on the tools bastions T157792

12 June 2020

  • curprev 13:1313:13, 12 June 2020imported>Stashbot 168,848 bytes +192 arturo: live-hacking session in the puppetmaster ended
  • curprev 00:1600:16, 12 June 2020imported>Stashbot 168,656 bytes +227 bstorm_: remounted NFS for tools-k8s-control-3 and tools-acme-chief-01

4 June 2020

  • curprev 13:3213:32, 4 June 2020imported>Stashbot 168,429 bytes +104 bd808: Manually restored /etc/haproxy/conf.d/elastic.cfg on tools-elastic-*

2 June 2020

  • curprev 12:2312:23, 2 June 2020imported>Stashbot 168,325 bytes +441 arturo: renewed TLS cert for k8s metrics-server (T250874) following docs: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Certificates#internal_API_access

1 June 2020

  • curprev 23:5123:51, 1 June 2020imported>Stashbot 167,884 bytes +112 bstorm_: refreshed certs for the custom webhook controllers on the k8s cluster T250874
  • curprev 00:3900:39, 1 June 2020imported>Stashbot 167,772 bytes +206 bd808: Ugh. Prior SAL message was about tools-sgeexec-0940

29 May 2020

  • curprev 19:3719:37, 29 May 2020imported>Stashbot 167,566 bytes +160 bstorm_: adding docker image for paws-public docker-registry.tools.wmflabs.org/paws-public-nginx:openresty T252217

28 May 2020

  • curprev 21:1921:19, 28 May 2020imported>Stashbot 167,406 bytes +953 bd808: Killed 7 python processes run by user 'mattho69' on login.toolforge.org

27 May 2020

  • curprev 17:2317:23, 27 May 2020imported>Stashbot 166,453 bytes +160 bstorm_: deleting "tools-k8s-worker-20", "tools-k8s-worker-19", "tools-k8s-worker-18", "tools-k8s-worker-17", "tools-k8s-worker-16"

26 May 2020

  • curprev 18:4518:45, 26 May 2020imported>Stashbot 166,293 bytes +242 bstorm_: upgrading maintain-kubeusers to match what is in toolsbeta T246059 T211096

22 May 2020

  • curprev 20:0020:00, 22 May 2020imported>Stashbot 166,051 bytes +227 bstorm_: rebooted tools-sgebastion-07 to clear up tmp file problems with 10 min warning

21 May 2020

  • curprev 22:4022:40, 21 May 2020imported>Stashbot 165,824 bytes +285 bd808: Rebuilding all Docker containers for tools-webservice 0.70 (T252700)

20 May 2020

  • curprev 09:5909:59, 20 May 2020imported>Stashbot 165,539 bytes +896 arturo: now running tesseract-ocr v4.1.1-2~bpo9+1 in the Toolforge grid (T247422)

19 May 2020

  • curprev 17:0017:00, 19 May 2020imported>Stashbot 164,643 bytes +171 bstorm_: deleting/restarting the paws db-proxy pod because it cannot connect to the replicas...and I'm hoping that's due to depooling and such

13 May 2020

  • curprev 18:1418:14, 13 May 2020imported>Stashbot 164,472 bytes +254 bstorm_: upgrading calico to 3.14.0 with typha enabled in Toolforge K8s T250863

9 May 2020

  • curprev 00:2800:28, 9 May 2020imported>Stashbot 164,218 bytes +332 bstorm_: added nfs.* to ignored_fs_types for the prometheus::node_exporter params in project hiera T252260

7 May 2020

  • curprev 21:5121:51, 7 May 2020imported>Stashbot 163,886 bytes +245 bstorm_: rebuilding the docker images for Toolforge k8s

6 May 2020

  • curprev 21:2021:20, 6 May 2020imported>Stashbot 163,641 bytes +509 bd808: Kubectl delete node tools-k8s-worker-[16-20] (T248702)
  • curprev 00:0100:01, 6 May 2020imported>Stashbot 163,132 bytes +444 bd808: Joining tools-k8s-worker-60 to the k8s worker pool

4 May 2020

  • curprev 22:0822:08, 4 May 2020imported>Stashbot 162,688 bytes +346 bstorm_: deleting tools-elastic-01/2/3 T236606

29 April 2020

  • curprev 22:1322:13, 29 April 2020imported>Stashbot 162,342 bytes +452 bstorm_: running a fixup script after fixing a bug T247455

28 April 2020

  • curprev 22:5822:58, 28 April 2020imported>Stashbot 161,890 bytes +131 bstorm_: rebuilding docker-registry.tools.wmflabs.org/maintain-kubeusers:beta T247455

23 April 2020

  • curprev 19:2219:22, 23 April 2020imported>Stashbot 161,759 bytes +92 bd808: Increased Kubernetes services quota for bd808-test tool.

21 April 2020

  • curprev 23:0623:06, 21 April 2020imported>Stashbot 161,667 bytes +386 bstorm_: repooled tools-k8s-worker-38/52, tools-sgewebgrid-lighttpd-0918/9 and tools-sgeexec-0901 T250869

20 April 2020

  • curprev 15:3115:31, 20 April 2020imported>Stashbot 161,281 bytes +607 bd808: Rebuilding Docker containers to pick up tools-webservice v0.68 (T250625)

15 April 2020

  • curprev 23:2023:20, 15 April 2020imported>Stashbot 160,674 bytes +253 bd808: Building ruby25-sssd/base and children (T141388, T250118)

14 April 2020

  • curprev 18:2618:26, 14 April 2020imported>Stashbot 160,421 bytes +316 bstorm_: Deployed new code and RBAC for maintain-kubeusers T246123

10 April 2020

  • curprev 21:3321:33, 10 April 2020imported>Stashbot 160,105 bytes +369 bd808: Rebuilding all Docker images for the Kubernetes cluster (T249843)

9 April 2020

  • curprev 15:1315:13, 9 April 2020imported>Stashbot 159,736 bytes +522 bd808: Rebuilding all stretch and buster Docker images. Jessie is broken at the moment due to package version mismatches
  • curprev 00:2000:20, 9 April 2020imported>Stashbot 159,214 bytes +450 bd808: Docker rebuild failed in toolforge-python2-sssd-base: "zlib1g-dev : Depends: zlib1g (= 1:1.2.8.dfsg-2+b1) but 1:1.2.8.dfsg-2+deb8u1 is to be installed"

7 April 2020

  • curprev 20:0620:06, 7 April 2020imported>Stashbot 158,764 bytes +161 andrewbogott: sss_cache -E on tools-sgebastion-08 and tools-sgebastion-09

6 April 2020

3 April 2020

30 March 2020

  • curprev 18:2818:28, 30 March 2020imported>Stashbot 157,942 bytes +316 bstorm_: Beginning rolling depool, remount, repool of k8s workers for T248702

27 March 2020

  • curprev 21:2221:22, 27 March 2020imported>Stashbot 157,626 bytes +374 bstorm_: removed puppet prefix tools-docker-builder T248703

24 March 2020

  • curprev 11:4411:44, 24 March 2020imported>Stashbot 157,252 bytes +427 arturo: trying to solve a rebase/merge conflict in labs/private.git in tools-puppetmaster-02

18 March 2020

  • curprev 19:0719:07, 18 March 2020imported>Stashbot 156,825 bytes +730 bstorm_: removed role::toollabs::logging::sender from project puppet (it wouldn't work anyway)

17 March 2020

  • curprev 13:2913:29, 17 March 2020imported>Stashbot 156,095 bytes +113 arturo: set `profile::toolforge::bastion::nproc: 200` for tools-sgebastion-08 (T219070)
  • curprev 00:0800:08, 17 March 2020imported>Stashbot 155,982 bytes +357 bstorm_: shut off tools-flannel-etcd-01/02/03 T246689

11 March 2020

6 March 2020

  • curprev 16:2516:25, 6 March 2020imported>Stashbot 155,550 bytes +100 bstorm_: updating maintain-kubeusers image to filter invalid tool names

3 March 2020

  • curprev 18:1618:16, 3 March 2020imported>Stashbot 155,450 bytes +597 jeh: create OpenStack DNS record for elasticsearch.svc.tools.eqiad1.wikimedia.cloud (eqiad1 subdomain change) T236606

2 March 2020

  • curprev 22:2622:26, 2 March 2020imported>Stashbot 154,853 bytes +125 jeh: starting first pass of elasticsearch data migration to new cluster T236606

1 March 2020

  • curprev 01:4801:48, 1 March 2020imported>Stashbot 154,728 bytes +330 bstorm_: old version of kubectl removed. Anyone who needs it can download it with `curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl`

28 February 2020

  • curprev 22:1422:14, 28 February 2020imported>Stashbot 154,398 bytes +2,223 bstorm_: shutting down the old maintain-kubeusers and taking the gloves off the new one (removing --gentle-mode)
  • curprev 00:5000:50, 28 February 2020imported>Stashbot 152,175 bytes +1,873 bstorm_: rebuilt all docker images to include webservice 0.64

27 February 2020

25 February 2020

23 February 2020

21 February 2020

20 February 2020

  • curprev 14:4914:49, 20 February 2020imported>Stashbot 149,127 bytes +117 andrewbogott: moving tools-k8s-worker-19 and tools-k8s-worker-18 to cloudvirt1022 (as part of draining 1014)
  • curprev 00:0400:04, 20 February 2020imported>Stashbot 149,010 bytes +526 Krenair: Shut off tools-puppetmaster-01 - to be deleted in one week T245365

19 February 2020

  • curprev 00:5900:59, 19 February 2020imported>Stashbot 148,484 bytes +424 bd808: Live hacked the "nginx-configuration" ConfigMap for T245426 (done several hours ago, but I forgot to !log it)

17 February 2020

  • curprev 18:5318:53, 17 February 2020imported>Stashbot 148,060 bytes +286 arturo: T168677 created DNS TXT record _psl.toolforge.org. with value `https://github.com/publicsuffix/list/pull/970`

14 February 2020

  • curprev 00:3800:38, 14 February 2020imported>Stashbot 147,774 bytes +1,893 bd808: Added tools-k8s-worker-35 to 2020 Kubernetes cluster (T244791)

12 February 2020

  • curprev 19:2919:29, 12 February 2020imported>Stashbot 145,881 bytes +199 bd808: Rebuilding all Docker images to pick up toollabs-webservice (0.63) (T244954)
  • curprev 00:2000:20, 12 February 2020imported>Stashbot 145,682 bytes +785 bd808: Depooling tools-sgewebgrid-generic-0903 (T244791)

10 February 2020

7 February 2020

  • curprev 10:5510:55, 7 February 2020imported>Stashbot 144,360 bytes +167 arturo: drop jessie VM instances tools-prometheus-{01,02} which were shutdown (T238096)

6 February 2020

  • curprev 10:4410:44, 6 February 2020imported>Stashbot 144,193 bytes +367 arturo: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/565556 which is a behavior change to the Toolforge front proxy (T234617)

5 February 2020

  • curprev 11:2211:22, 5 February 2020imported>Stashbot 143,826 bytes +155 arturo: restarting ferm fleet-wide to account for prometheus servers changed IP (but same hostname) (T238096)

4 February 2020

  • curprev 11:3811:38, 4 February 2020imported>Stashbot 143,671 bytes +258 arturo: start again tools-prometheus-01 again to sync data to the new tools-prometheus-03/04 VMs (T238096)

3 February 2020

  • curprev 14:1214:12, 3 February 2020imported>Stashbot 143,413 bytes +471 arturo: move tools-prometheus-04 from cloudvirt1022 to cloudvirt1013

31 January 2020

  • curprev 14:0614:06, 31 January 2020imported>Stashbot 142,942 bytes +411 arturo: leave tools-prometheus-01 as the backend for tools-prometheus.wmflabs.org for the weekend so grafana dashboards keep working (T238096)

30 January 2020

  • curprev 21:0421:04, 30 January 2020imported>Stashbot 142,531 bytes +1,613 andrewbogott: also apt-get install python3-novaclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/

29 January 2020

  • curprev 20:0720:07, 29 January 2020imported>Stashbot 140,918 bytes +174 bd808: Created {bastion,login,dev}.toolforge.org service names for Toolforge bastions using Horizon & Designate

28 January 2020

  • curprev 13:3513:35, 28 January 2020imported>Stashbot 140,744 bytes +289 arturo: `aborrero@tools-clushmaster-02:~$ clush -w @exec-stretch 'for i in $(ps aux | grep [t]ools.j | awk -F" " "{print \$2}") ; do echo "killing $i" ; sudo kill $i ; done || true'` (T243831)

27 January 2020

  • curprev 07:0507:05, 27 January 2020imported>Stashbot 140,455 bytes +329 zhuyifei1999_: wrong package. uninstalled. the correct one is bpfcc-tools and seems only available in buster+. T115231

24 January 2020

  • curprev 20:5820:58, 24 January 2020imported>Stashbot 140,126 bytes +457 bd808: Built tools-k8s-worker-21 to test out build script following openstack client upgrade

23 January 2020

  • curprev 23:3823:38, 23 January 2020imported>Stashbot 139,669 bytes +421 bd808: Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes

22 January 2020

  • curprev 12:4312:43, 22 January 2020imported>Stashbot 139,248 bytes +200 arturo: for the record, issue with tools-worker-1016 was memory exhaustion apparently

21 January 2020

  • curprev 19:2519:25, 21 January 2020imported>Stashbot 139,048 bytes +398 bstorm_: hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network

16 January 2020

  • curprev 23:5423:54, 16 January 2020imported>Stashbot 138,650 bytes +432 bstorm_: rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly

14 January 2020

  • curprev 15:2915:29, 14 January 2020imported>Stashbot 138,218 bytes +216 bstorm_: failed the gridengine master back to the master server from the shadow

13 January 2020

  • curprev 17:4817:48, 13 January 2020imported>Stashbot 138,002 bytes +557 bd808: Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642)

12 January 2020

11 January 2020

  • curprev 01:3301:33, 11 January 2020imported>Stashbot 137,221 bytes +157 bstorm_: updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos.

10 January 2020

9 January 2020

  • curprev 23:3523:35, 9 January 2020imported>Stashbot 136,602 bytes +533 bstorm_: depooled tools-sgeexec-0939 because it isn't acting right and rebooting it

7 January 2020

  • curprev 22:4022:40, 7 January 2020imported>Stashbot 136,069 bytes +1,199 bstorm_: rebooted tools-worker-1007 to recover it from disk full and general badness
  • curprev 00:2600:26, 7 January 2020imported>Stashbot 134,870 bytes +1,665 bstorm_: repooled tools-sgewebgrid-lighttpd-0919

4 January 2020

3 January 2020

  • curprev 16:4816:48, 3 January 2020imported>Stashbot 131,428 bytes +586 bstorm_: updated the ValidatingWebhookConfiguration for the ingress admission controller to the working settings
  • curprev 00:1100:11, 3 January 2020imported>Stashbot 130,842 bytes +191 bd808: Rebuiliding all stretch-ssd Docker images to pick up busybox

30 December 2019

  • curprev 05:0205:02, 30 December 2019imported>Stashbot 130,651 bytes +195 andrewbogott: moving tools-worker-1012 to cloudvirt1024 for T241523

29 December 2019

  • curprev 01:3801:38, 29 December 2019imported>Stashbot 130,456 bytes +215 Krenair: Cordoned tools-worker-1012 and deleted pods associated with dplbot and dewikigreetbot as well as my own testing one, host seems to be under heavy load - T241523

27 December 2019

  • curprev 15:0615:06, 27 December 2019imported>Stashbot 130,241 bytes +142 Krenair: Killed a "python parse_page.py outreachy" process by aikochou that was hogging IO on tools-sgebastion-07

25 December 2019

  • curprev 16:0716:07, 25 December 2019imported>Stashbot 130,099 bytes +134 zhuyifei1999_: pkilled 5 `python pwb.py` processes belonging to `tools.kaleem-bot` on tools-sgebastion-07

22 December 2019

  • curprev 20:1320:13, 22 December 2019imported>Stashbot 129,965 bytes +263 bd808: Enabled Puppet on tools-proxy-06.tools.eqiad.wmflabs after nginx config test (T241310)

20 December 2019

  • curprev 22:2822:28, 20 December 2019imported>Stashbot 129,702 bytes +211 bd808: Re-enabled Puppet on tools-sgebastion-09. Reason for disable was "arturo raising systemd limits"

18 December 2019

  • curprev 17:3317:33, 18 December 2019imported>Stashbot 129,491 bytes +310 bstorm_: updated package in aptly for toollabs-webservice to 0.53

17 December 2019

  • curprev 20:2520:25, 17 December 2019imported>Stashbot 129,181 bytes +950 bd808: Fixed https://tools.wmflabs.org/ to redirect to https://tools.wmflabs.org/admin/
  • curprev 00:4500:45, 17 December 2019imported>Stashbot 128,231 bytes +295 bstorm_: enabled encryption at rest on the new k8s cluster

14 December 2019

  • curprev 10:4810:48, 14 December 2019imported>Stashbot 127,936 bytes +153 valhallasw`cloud: re-enabling puppet on tools-sgeexec-0912, likely left-over from NFS maintenance (no reason was specified).

13 December 2019

11 December 2019

  • curprev 18:1318:13, 11 December 2019imported>Stashbot 126,661 bytes +239 bd808: Restarted maintain-dbusers on labstore1004. Process had not logged any account creations since 2019-12-01T22:45:45.

10 December 2019

9 December 2019

  • curprev 11:0611:06, 9 December 2019imported>Stashbot 126,314 bytes +144 andrewbogott: deleting unused security groups: catgraph, devpi, MTA, mysql, syslog, test T91619

4 December 2019

  • curprev 13:4513:45, 4 December 2019imported>Stashbot 126,170 bytes +101 arturo: drop puppet prefix `tools-cron`, deprecated and no longer in use

29 November 2019

19 November 2019

  • curprev 13:4913:49, 19 November 2019imported>Stashbot 124,129 bytes +239 arturo: re-create nginx-ingress pod due to deployment template refresh (T237643)

15 November 2019

13 November 2019

  • curprev 17:2017:20, 13 November 2019imported>Stashbot 123,790 bytes +154 arturo: live-hacking tools-prometheus-01 to test some experimental configs for the new k8s cluster (T237643)

12 November 2019

10 November 2019

  • curprev 02:1702:17, 10 November 2019imported>Stashbot 123,529 bytes +520 bd808: Building new Docker images for T237836 (retrying after cleaning out old images on tools-docker-builder-06)

8 November 2019

  • curprev 22:4722:47, 8 November 2019imported>Stashbot 123,009 bytes +477 bstorm_: adding rsync::server::wrap_with_stunnel: false to the tools-docker-registry-03/4 servers to unbreak puppet

7 November 2019

  • curprev 13:2713:27, 7 November 2019imported>Stashbot 122,532 bytes +616 arturo: deployed registry-admission-webhook and ingress-admission-controller into the new k8s cluster (T236826)

6 November 2019

  • curprev 22:3222:32, 6 November 2019imported>Stashbot 121,916 bytes +804 bstorm_: added rsync::server::wrap_with_stunnel: false to tools-sge-services prefix to fix puppet

5 November 2019

  • curprev 23:0823:08, 5 November 2019imported>Stashbot 121,112 bytes +1,265 Krenair: Dropped 59a77a3, 3830802, and 83df61f from tools-puppetmaster-01:/var/lib/git/labs/private cherry-picks as these are no longer required T206235

4 November 2019

  • curprev 14:4514:45, 4 November 2019imported>Stashbot 119,847 bytes +503 phamhi: Built and pushed ruby25 docker image based on buster (T230961)

1 November 2019

  • curprev 21:0021:00, 1 November 2019imported>Stashbot 119,344 bytes +604 Krenair: Removed tools-checker.wmflabs.org A record to 208.80.155.229 as that target IP is in the old pre-neutron range that is no longer routed

31 October 2019

  • curprev 18:4718:47, 31 October 2019imported>Stashbot 118,740 bytes +783 andrewbogott: deleted and/or truncated a bunch of logfiles on tools-worker-1001. Runaway logfiles filled up the drive which prevented puppet from running. If puppet had run, it would have prevented the runaway logfiles.

30 October 2019

  • curprev 13:5313:53, 30 October 2019imported>Stashbot 117,957 bytes +464 arturo: replacing SSL cert in tools-proxy-x server apparently OK (merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/545679) T235252

29 October 2019

28 October 2019

  • curprev 16:0616:06, 28 October 2019imported>Stashbot 117,323 bytes +2,583 arturo: delete VM instance `tools-test-proxy-01` and the puppet prefix `tools-test-proxy`

24 October 2019

  • curprev 16:3216:32, 24 October 2019imported>Stashbot 114,740 bytes +103 bstorm_: set the prod rsyslog config for kubernetes to false for Toolforge

23 October 2019

  • curprev 20:0020:00, 23 October 2019imported>Stashbot 114,637 bytes +437 phamhi: Rebuilding all jessie and stretch docker images to pick up toollabs-webservice 0.47 (T233347)

22 October 2019

  • curprev 16:5616:56, 22 October 2019imported>Stashbot 114,200 bytes +177 bstorm_: drained tools-worker-1025.tools.eqiad.wmflabs which was malfunctioning

21 October 2019

  • curprev 17:3217:32, 21 October 2019imported>Stashbot 114,023 bytes +120 phamhi: Rebuilding all jessie and stretch docker images to pick up toollabs-webservice 0.46

18 October 2019

  • curprev 22:1522:15, 18 October 2019imported>Stashbot 113,903 bytes +353 bd808: Rescheduled continuous jobs away from tools-sgeexec-0904 because of high system load

16 October 2019

  • curprev 16:2116:21, 16 October 2019imported>Stashbot 113,550 bytes +390 phamhi: Deployed toollabs-webservice 0.46 to buster-tools and stretch-tools (T218461)

15 October 2019

  • curprev 17:1017:10, 15 October 2019imported>Stashbot 113,160 bytes +97 phamhi: restart tools-worker-1035 because it is no longer responding

14 October 2019

  • curprev 09:2609:26, 14 October 2019imported>Stashbot 113,063 bytes +116 arturo: cleaned-up updatetools from tools-sge-services nodes (T229261)

11 October 2019

  • curprev 19:5219:52, 11 October 2019imported>Stashbot 112,947 bytes +659 bstorm_: restarted docker on tools-docker-builder after phamhi noticed the daemon had a routing issue (blank iptables)

10 October 2019

  • curprev 02:3302:33, 10 October 2019imported>Stashbot 112,288 bytes +92 bd808: Rebooting tools-sgewebgrid-lighttpd-0903. Instance hung.

9 October 2019

  • curprev 22:5222:52, 9 October 2019imported>Stashbot 112,196 bytes +847 jeh: removing test instances tools-sssd-sgeexec-test-[12] from SGE

8 October 2019

  • curprev 19:4019:40, 8 October 2019imported>Stashbot 111,349 bytes +410 bstorm_: drained tools-worker-1007/8 to rebalance the cluster

7 October 2019

  • curprev 20:1720:17, 7 October 2019imported>Stashbot 110,939 bytes +4,103 bd808: Dropped backlog of messages for delivery to tools.usrd-tools

4 October 2019

  • curprev 21:4321:43, 4 October 2019imported>Stashbot 106,836 bytes +557 bd808: `sudo exec-manage repool tools-sgeexec-0923.tools.eqiad.wmflabs`

3 October 2019

  • curprev 13:0513:05, 3 October 2019imported>Stashbot 106,279 bytes +101 arturo: delete servers tools-sssd-sgeexec-test-[1,2], no longer required

27 September 2019

  • curprev 16:5916:59, 27 September 2019imported>Stashbot 106,178 bytes +103 bd808: Set "profile::rsyslog::kafka_shipper::kafka_brokers: []" in tools-elastic prefix puppet
  • curprev 00:4000:40, 27 September 2019imported>Stashbot 106,075 bytes +90 bstorm_: depooled and rebooted tools-sgewebgrid-lighttpd-0927

25 September 2019

23 September 2019

  • curprev 16:5816:58, 23 September 2019imported>Stashbot 105,888 bytes +192 bstorm_: deployed tools-manifest 0.20 and restarted webservicemonitor

12 September 2019

11 September 2019

9 September 2019

6 September 2019

5 September 2019

  • curprev 21:0221:02, 5 September 2019imported>Stashbot 105,347 bytes +242 bd808: Enabled Puppet on tools-docker-registry-03 and forced puppet run (T232135)

1 September 2019

  • curprev 20:5120:51, 1 September 2019imported>Stashbot 105,105 bytes +100 Reedy: `sudo service maintain-kubeusers restart` on tools-k8s-master-01

30 August 2019

  • curprev 16:5416:54, 30 August 2019imported>Stashbot 105,005 bytes +201 phamhi: restart maintain-kuberusers service in tools-k8s-master-01

29 August 2019

  • curprev 22:1822:18, 29 August 2019imported>Stashbot 104,804 bytes +357 bd808: Finished building new stretch Docker images for Toolforge Kubernetes use

27 August 2019

  • curprev 19:1019:10, 27 August 2019imported>Stashbot 104,447 bytes +116 bd808: Restarted maintain-kubeusers after complaint on irc. It was stuck in limbo again

26 August 2019

  • curprev 21:4821:48, 26 August 2019imported>Stashbot 104,331 bytes +163 bstorm_: repooled tools-sgewebgrid-generic-0902, tools-sgewebgrid-lighttpd-0902, tools-sgewebgrid-lighttpd-0903 and tools-sgeexec-0905

18 August 2019

  • curprev 08:1108:11, 18 August 2019imported>Stashbot 104,168 bytes +95 arturo: restart maintain-kuberusers service in tools-k8s-master-01

17 August 2019

  • curprev 10:5610:56, 17 August 2019imported>Stashbot 104,073 bytes +88 arturo: force-reboot tools-worker-1006. Is completely stuck

15 August 2019

13 August 2019

  • curprev 22:0022:00, 13 August 2019imported>Stashbot 103,763 bytes +200 bstorm_: truncated exim paniclog on tools-sgecron-01 because it was being spammy

12 August 2019

  • curprev 16:0816:08, 12 August 2019imported>Stashbot 103,563 bytes +171 phamhi: updated prometheus-node-exporter from 0.14.0~git20170523-1 to 0.17.0+ds-3 in tools-worker-[1030-1040] nodes (T230147)

8 August 2019

  • curprev 19:2619:26, 8 August 2019imported>Stashbot 103,392 bytes +100 jeh: restarting tools-sgewebgrid-lighttpd-0915 T230157

7 August 2019

  • curprev 19:0719:07, 7 August 2019imported>Stashbot 103,292 bytes +121 bd808: Disassociated SUL and Phabricator accounts from user Lophi (T229713)

6 August 2019

  • curprev 16:1816:18, 6 August 2019imported>Stashbot 103,171 bytes +290 arturo: add phamhi as user/projectadmin (T228942) and delete hpham

5 August 2019

2 August 2019

  • curprev 14:0014:00, 2 August 2019imported>Stashbot 102,470 bytes +93 andrewbogott_: rebooting tools-worker-1022 as it is unresponsive

31 July 2019

  • curprev 18:0718:07, 31 July 2019imported>Stashbot 102,377 bytes +641 bstorm_: drained tools-worker-1015/05/03/17 to rebalance load

27 July 2019

  • curprev 23:0023:00, 27 July 2019imported>Stashbot 101,736 bytes +247 zhuyifei1999_: a past probably related ticket: T194859

26 July 2019

  • curprev 17:3917:39, 26 July 2019imported>Stashbot 101,489 bytes +492 bstorm_: restarted maintain-kubeusers because it was suspiciously tardy and quiet

25 July 2019

24 July 2019

  • curprev 10:1410:14, 24 July 2019imported>Stashbot 100,855 bytes +251 arturo: reallocating tools-puppetmaster-01 from cloudvirt1027 to cloudvirt1028 (T227539)

22 July 2019

  • curprev 18:3918:39, 22 July 2019imported>Stashbot 100,604 bytes +577 bstorm_: repooled tools-sgeexec-0905 after reboot

20 July 2019

17 July 2019

  • curprev 20:2320:23, 17 July 2019imported>Stashbot 99,957 bytes +90 andrewbogott: migrating tools-sgegrid-shadow to cloudvirt1014

15 July 2019

  • curprev 14:5014:50, 15 July 2019imported>Stashbot 99,867 bytes +140 bstorm_: cleared error state from tools-sgeexec-0911 which went offline after error from job 5190035

25 June 2019

24 June 2019

  • curprev 17:4217:42, 24 June 2019imported>Stashbot 99,632 bytes +85 andrewbogott: moving tools-sgeexec-0905 to cloudvirt1015

17 June 2019

  • curprev 14:0714:07, 17 June 2019imported>Stashbot 99,547 bytes +251 andrewbogott: moving tools-sgewebgrid-lighttpd-0903 to cloudvirt1015

11 June 2019

  • curprev 18:0318:03, 11 June 2019imported>Stashbot 99,296 bytes +103 bstorm_: deleted anomalous kubernetes node tools-worker-1019.eqiad.wmflabs

5 June 2019

  • curprev 18:3318:33, 5 June 2019imported>Stashbot 99,193 bytes +179 andrewbogott: repooled tools-sgeexec-0921 and tools-sgeexec-0929

30 May 2019

  • curprev 13:0113:01, 30 May 2019imported>Stashbot 99,014 bytes +1,951 arturo: uncordon/repool tools-worker-1001/2/3. They should be fine now. I'm only leaving 1029 cordoned for testing purposes

29 May 2019

  • curprev 11:1311:13, 29 May 2019imported>Stashbot 97,063 bytes +402 arturo: briefly tested some sssd config changes in tools-sgebastion-09

28 May 2019

  • curprev 18:1518:15, 28 May 2019imported>Stashbot 96,661 bytes +1,669 arturo: T221225 for the record, tools-worker-1001 is not working after trying with sssd

27 May 2019

  • curprev 09:4709:47, 27 May 2019imported>Stashbot 94,992 bytes +247 arturo: run `apt-get clean` to wipe 4GB of unused .deb packages, usage on / (root) was > 90% (on tools-sgebastion-08)

21 May 2019

20 May 2019

  • curprev 11:2511:25, 20 May 2019imported>Stashbot 94,657 bytes +271 arturo: T223332 enable puppet agent in tools-k8s-master and tools-docker-registry nodes and deploy new SSL cert

18 May 2019

  • curprev 11:1311:13, 18 May 2019imported>Stashbot 94,386 bytes +402 chicocvenancio: PAWS update helm chart to point to new singleuser image (T217908)

16 May 2019

  • curprev 11:2211:22, 16 May 2019imported>Stashbot 93,984 bytes +184 chicocvenancio: PAWS: restart hub to get new configured announcement

15 May 2019

  • curprev 16:2016:20, 15 May 2019imported>Stashbot 93,800 bytes +1,037 arturo: T223148 repool both tools-sgeexec-0921 and -0929

14 May 2019

13 May 2019

  • curprev 08:1508:15, 13 May 2019imported>Stashbot 91,370 bytes +176 zhuyifei1999_: `truncate -s 0 /var/log/exim4/paniclog` on tools-sgecron-01.tools.eqiad.wmflabs & tools-sgewebgrid-lighttpd-0921.tools.eqiad.wmflabs

7 May 2019

  • curprev 14:3814:38, 7 May 2019imported>Stashbot 91,194 bytes +1,080 arturo: T222718 uncordon tools-worker-1019, I couldn't find a reason for it to be cordoned

6 May 2019

3 May 2019

  • curprev 09:4309:43, 3 May 2019imported>Stashbot 89,909 bytes +741 arturo: fixed puppet in tools-puppetdb-01 too

30 April 2019

29 April 2019

  • curprev 11:2211:22, 29 April 2019imported>Stashbot 88,713 bytes +406 arturo: T221225 re-enable puppet agent in all toolforge servers

26 April 2019

25 April 2019

  • curprev 12:4912:49, 25 April 2019imported>Stashbot 88,146 bytes +296 arturo: T221225 using `profile::ldap::client::labs::client_stack: sssd` in horizon for tools-sgebastion-09 (testing)

24 April 2019

23 April 2019

  • curprev 15:2615:26, 23 April 2019imported>Stashbot 87,691 bytes +1,148 arturo: T221225 rebooting tools-sgebastion-08 to cleanup sssd

17 April 2019

  • curprev 12:0912:09, 17 April 2019imported>Stashbot 86,543 bytes +1,187 arturo: T221225 rebooting bastions to clean sssd. We are back to nscd/nslcd until we figure out what's wrong here

16 April 2019

  • curprev 20:4920:49, 16 April 2019imported>Stashbot 85,356 bytes +257 chicocvenancio: change paws announcement in configmap hub-config back to a welcome message

15 April 2019

  • curprev 18:5018:50, 15 April 2019imported>Stashbot 85,099 bytes +167 andrewbogott: moving tools-elastic-01 to cloudvirt1008 to make spreadcheck happy

14 April 2019

  • curprev 16:2316:23, 14 April 2019imported>Stashbot 84,932 bytes +112 andrewbogott: moved all tools-worker nodes off of cloudvirt1015 and uncordoned them

13 April 2019

  • curprev 21:0921:09, 13 April 2019imported>Stashbot 84,820 bytes +433 bstorm_: Moving tools-prometheus-01 to cloudvirt1009 and tools-clushmaster-02 to cloudvirt1008 for T220853

11 April 2019

  • curprev 22:3822:38, 11 April 2019imported>Stashbot 84,387 bytes +777 andrewbogott: moving tools-paws-worker-1005 to cloudvirt1009 to make spreadcheck happier
  • curprev 00:0300:03, 11 April 2019imported>Stashbot 83,610 bytes +1,993 andrewbogott: tools-paws-worker-1002, tools-paws-worker-1003 to eqiad1-r

10 April 2019

  • curprev 00:3200:32, 10 April 2019imported>Stashbot 81,617 bytes +1,255 andrewbogott: migrating tools-worker-1022, 1023, 1025, 1026 to eqiad1-r

8 April 2019

  • curprev 22:3622:36, 8 April 2019imported>Stashbot 80,362 bytes +182 andrewbogott: moving tools-worker-1006 and tools-worker-1007 to eqiad1-r

7 April 2019

  • curprev 16:5416:54, 7 April 2019imported>Stashbot 80,180 bytes +218 zhuyifei1999_: tools-sgeexec-0928 unresponsive since around 22 UTC. No data on Graphite. Can't ssh in even as root. Hard rebooting via Horizon

5 April 2019

4 April 2019

  • curprev 21:2121:21, 4 April 2019imported>Stashbot 79,888 bytes +1,354 bd808: Uncordoned tools-worker-1013.tools.eqiad.wmflabs after reboot and forced puppet run

3 April 2019

  • curprev 11:2211:22, 3 April 2019imported>Stashbot 78,534 bytes +138 arturo: puppet breakage in due to me introducing openstack-mitaka-jessie repo by mistake. Cleaning up already

2 April 2019

  • curprev 12:1112:11, 2 April 2019imported>Stashbot 78,396 bytes +189 arturo: icinga downtime toolschecker for 1 month T219243

1 April 2019

  • curprev 19:4419:44, 1 April 2019imported>Stashbot 78,207 bytes +313 bd808: Deleted tools-checker-02 via Horizon (T219243)

29 March 2019

  • curprev 21:1321:13, 29 March 2019imported>Stashbot 77,894 bytes +1,362 bstorm_: depooled tools-sgewebgrid-generic-0903 because of some stuck jobs and odd load characteristics

28 March 2019

26 March 2019

22 March 2019

  • curprev 17:1617:16, 22 March 2019imported>Stashbot 73,164 bytes +615 andrewbogott: switching all instances to use ldap-ro.eqiad.wikimedia.org as both primary and secondary ldap server
  • curprev 00:3900:39, 22 March 2019imported>Stashbot 72,549 bytes +620 bstorm_: T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0902

18 March 2019

17 March 2019

  • curprev 23:4123:41, 17 March 2019imported>Stashbot 71,457 bytes +586 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497210/ as a quick fix for T218494

16 March 2019

15 March 2019

  • curprev 21:0821:08, 15 March 2019imported>Stashbot 70,800 bytes +373 bstorm_: cleared error state on several queues T217280

14 March 2019

  • curprev 23:5223:52, 14 March 2019imported>Stashbot 70,427 bytes +2,052 bd808: Disabled job queues and rescheduled continuous jobs away from tools-exec-14{21,22,23,24,25,26,27,28,29,30,31,32} (T217152)

13 March 2019

  • curprev 23:3023:30, 13 March 2019imported>Stashbot 68,375 bytes +775 bd808: Rebuilding stretch Kubernetes images
  • curprev 00:2200:22, 13 March 2019imported>Stashbot 67,600 bytes +113 bd808: Raise web-memlimit for isbn tool to 6G for tomcat8 (T217406)

11 March 2019

  • curprev 15:5315:53, 11 March 2019imported>Stashbot 67,487 bytes +344 bd808: Manually started `service gridengine-master` on tools-sgegrid-master after reboot (T218038)
  • curprev 00:5300:53, 11 March 2019imported>Stashbot 67,143 bytes +562 bd808: Re-enabled 13 queue instances that had been disabled by LDAP failures during job initialization (T217280)

8 March 2019

  • curprev 00:3000:30, 8 March 2019imported>Stashbot 66,581 bytes +418 bd808: DNS record created for trusty-dev.tools.wmflabs.org (Trusty secondary bastion)

7 March 2019

  • curprev 00:4900:49, 7 March 2019imported>Stashbot 66,163 bytes +380 zhuyifei1999_: clushed misctools 1.37 upgrade on @bastion,@cron,@bastion-stretch T217406

4 March 2019

  • curprev 19:0719:07, 4 March 2019imported>Stashbot 65,783 bytes +276 bstorm_: umounted /mnt/nfs/dumps-labstore1006.wikimedia.org for T217473

3 March 2019

28 February 2019

27 February 2019

26 February 2019

25 February 2019

  • curprev 23:2023:20, 25 February 2019imported>Stashbot 64,237 bytes +1,248 bstorm_: Depooled tools-sgeexec-0914 and tools-sgeexec-0915 for T217066

22 February 2019

  • curprev 16:2916:29, 22 February 2019imported>Stashbot 62,989 bytes +213 gtirloni: upgraded and rebooted tools-puppetmaster-01 (new kernel)

21 February 2019

19 February 2019

  • curprev 01:4901:49, 19 February 2019imported>Stashbot 61,617 bytes +118 bd808: Revoked Toolforge project membership for user DannyS712 (T215092)

18 February 2019

  • curprev 20:4520:45, 18 February 2019imported>Stashbot 61,499 bytes +370 gtirloni: upgraded and rebooted tools-sgebastion-07 (login-stretch)

17 February 2019

16 February 2019

  • curprev 05:0005:00, 16 February 2019imported>Stashbot 60,808 bytes +1,745 zhuyifei1999_: fixed by restarting flannel. another puppet run simply started kubelet

14 February 2019

13 February 2019

  • curprev 19:1619:16, 13 February 2019imported>Stashbot 57,985 bytes +680 andrewbogott: deleting tools-sgewebgrid-generic-0901, tools-sgewebgrid-lighttpd-0901, tools-sgebastion-06

12 February 2019

  • curprev 01:2401:24, 12 February 2019imported>Stashbot 57,305 bytes +153 bd808: Stopped maintain-kubeusers, edited /etc/kubernetes/tokenauth, restarted maintain-kubeusers (T215704)

11 February 2019

  • curprev 22:5722:57, 11 February 2019imported>Stashbot 57,152 bytes +1,621 bd808: Shutoff tools-webgrid-lighttpd-14{01,13,24,26,27,28} via Horizon UI

8 February 2019

  • curprev 19:1719:17, 8 February 2019imported>Stashbot 55,531 bytes +434 hauskatze: Stopped webservice of `tools.sulinfo` which redirects to `tools.quentinv57-tools` which is also unavalaible
  • curprev 01:0701:07, 8 February 2019imported>Stashbot 55,097 bytes +351 bd808: Creating tools-sgebastion-07

4 February 2019

30 January 2019

25 January 2019

  • curprev 20:5020:50, 25 January 2019imported>Stashbot 54,281 bytes +336 bd808: Deployed new tcl/web Kubernetes image based on Debian Stretch (T214668)

24 January 2019

23 January 2019

  • curprev 22:1822:18, 23 January 2019imported>Stashbot 53,604 bytes +679 bd808: Building new tools-sgewebgrid-lighttpd-0904 instance using Stretch base image (T214519)

22 January 2019

18 January 2019

17 January 2019

  • curprev 23:3723:37, 17 January 2019imported>Stashbot 52,497 bytes +574 bd808: Shutdown tools-package-builder-01. Use tools-package-builder-02 instead!

16 January 2019

  • curprev 17:2917:29, 16 January 2019imported>Stashbot 51,923 bytes +476 andrewbogott: depooling and moving tools-sgeexec-0904 tools-sgeexec-0906 tools-sgewebgrid-lighttpd-0904

15 January 2019

  • curprev 21:0221:02, 15 January 2019imported>Stashbot 51,447 bytes −178,393 bstorm_: restarting webservicemonitor on tools-services-02 -- acting funny

11 January 2019

  • curprev 11:5511:55, 11 January 2019imported>Stashbot 229,840 bytes +296 arturo: T213418 shutdown tools-docker-builder-05, will give a grace period before deleting the VM

10 January 2019

7 January 2019

  • curprev 17:2117:21, 7 January 2019imported>Stashbot 228,820 bytes +325 bstorm_: T67777 - set the max_u_jobs global grid config setting to 50 in the new grid

6 January 2019

  • curprev 22:0622:06, 6 January 2019imported>Stashbot 228,495 bytes +103 bd808: Added floating ip to tools-sgebastion-06 (T212360)

5 January 2019

  • curprev 23:5423:54, 5 January 2019imported>Stashbot 228,392 bytes +173 bd808: Manually installed php-mbstring on tools-sgebastion-06. Gerrit patch submitted to install it on the rest of the Son of Grid Engine nodes.

4 January 2019

  • curprev 21:3721:37, 4 January 2019imported>Stashbot 228,219 bytes +114 bd808: Truncated /data/project/.system/accounting after archiving ~30 days of history

3 January 2019

21 December 2018

17 December 2018

  • curprev 22:1622:16, 17 December 2018imported>Stashbot 227,150 bytes +478 bstorm_: Adding a bunch of hiera values and prefixes for the new grid - T212153

11 December 2018

5 December 2018

  • curprev 12:1712:17, 5 December 2018imported>Stashbot 226,588 bytes +129 gtirloni: remoted node tools-worker-1029.tools.eqiad.wmflabs from cluster (T196973)

4 December 2018

  • curprev 22:4722:47, 4 December 2018imported>Stashbot 226,459 bytes +262 bstorm_: gtirloni added back main floating IP for tools-k8s-master-01 and removed unnecessary ones to stop k8s outage T164123

1 December 2018

  • curprev 02:4402:44, 1 December 2018imported>Stashbot 226,197 bytes +88 gtirloni: deleted instance tools-exec-gift-trusty-01 (T194615)
  • curprev 00:1000:10, 1 December 2018imported>Stashbot 226,109 bytes +402 andrewbogott: moving tools-worker-1020 and tools-worker-1022 to different labvirts

27 November 2018

  • curprev 17:4917:49, 27 November 2018imported>Stashbot 225,707 bytes +121 bstorm_: restarted maintain-kubeusers just in case it had any issues reconnecting to toolsdb

26 November 2018

  • curprev 17:3917:39, 26 November 2018imported>Stashbot 225,586 bytes +348 gtirloni: updated tools-manifest package on tools-services-01/02 to version 0.12 (10->60 seconds sleep time) (T210190)

20 November 2018

  • curprev 23:0523:05, 20 November 2018imported>Stashbot 225,238 bytes +451 gtirloni: Published stretch-tools and stretch-toolsbeta aptly repositories individually on tools-services-01

16 November 2018

  • curprev 21:1621:16, 16 November 2018imported>Stashbot 224,787 bytes +435 bd808: Ran grid engine orphan process kill script from T153281. Only 3 orphan php-cgi processes belonging to iluvatarbot found.

14 November 2018

13 November 2018

  • curprev 17:4017:40, 13 November 2018imported>Stashbot 224,138 bytes +717 arturo: remove misctools 1.31 and jobutils 1.30 from the stretch-tools repo (T207970)

8 November 2018

7 November 2018

  • curprev 10:3710:37, 7 November 2018imported>Stashbot 222,560 bytes +112 gtirloni: removed invalid apt.conf.d file from all hosts (T110055)

2 November 2018

  • curprev 18:1118:11, 2 November 2018imported>Stashbot 222,448 bytes +174 arturo: T206223 some disturbances due to the certificate renewal

31 October 2018

29 October 2018

  • curprev 17:0017:00, 29 October 2018imported>Stashbot 222,111 bytes +108 bd808: Ran grid engine orphan process kill script from T153281

26 October 2018

  • curprev 10:3410:34, 26 October 2018imported>Stashbot 222,003 bytes +236 arturo: T207970 added misctools 1.31 and jobutils 1.30 to stretch-tools aptly repo

19 October 2018

  • curprev 14:1714:17, 19 October 2018imported>Stashbot 221,767 bytes +65 andrewbogott: moving tools-clushmaster-01 to labvirt1004
  • curprev 00:2900:29, 19 October 2018imported>Stashbot 221,702 bytes +321 andrewbogott: migrating tools-exec-1411 and tools-exec-1410 off of cloudvirt1017

16 October 2018

  • curprev 15:1315:13, 16 October 2018imported>Stashbot 221,381 bytes +205 bd808: (repost for gtirloni) T186571 removed legofan4000 user from project-tools group (leftover from T165624 legofan4000->macfan4000 rename)

7 October 2018

  • curprev 21:5721:57, 7 October 2018imported>Stashbot 221,176 bytes +380 zhuyifei1999_: restarted maintain-kubeusers on tools-k8s-master-01 T194859

21 September 2018

  • curprev 12:3512:35, 21 September 2018imported>Stashbot 220,796 bytes +431 arturo: cleanup stalled apt preference files (pinning) in tools-clushmaster-01

17 September 2018

  • curprev 09:1309:13, 17 September 2018imported>Stashbot 220,365 bytes +128 arturo: T204481 aborrero@tools-mail:~$ sudo exiqgrep -i | xargs sudo exim -Mrm

14 September 2018

  • curprev 11:2211:22, 14 September 2018imported>Stashbot 220,237 bytes +246 arturo: T204267 stop the corhist tool (k8s) because is hammering the wikidata API

8 September 2018

  • curprev 10:3510:35, 8 September 2018imported>Stashbot 219,991 bytes +118 gtirloni: restarted cron and truncated /var/log/exim4/paniclog (T196137)

7 September 2018

27 August 2018

  • curprev 23:4023:40, 27 August 2018imported>Stashbot 219,785 bytes +320 bd808: `# exec-manage repool tools-webgrid-generic-1402.eqiad.wmflabs` T202932

22 August 2018

  • curprev 13:0213:02, 22 August 2018imported>Stashbot 219,465 bytes +236 arturo: I used this command: `sudo exim -bp | sudo exiqgrep -i | xargs sudo exim -Mrm`

19 August 2018

  • curprev 09:1209:12, 19 August 2018imported>Stashbot 219,229 bytes +140 legoktm: rebuilding python/base k8s images for https://gerrit.wikimedia.org/r/453665 (T202218)

14 August 2018

  • curprev 21:0221:02, 14 August 2018imported>Stashbot 219,089 bytes +182 legoktm: rebuilt php7.2 docker images for https://gerrit.wikimedia.org/r/452755

13 August 2018

9 August 2018

  • curprev 10:4010:40, 9 August 2018imported>Stashbot 218,673 bytes +293 arturo: T201602 upgrade packages from jessie-backports (excluding python-designateclient)

8 August 2018

  • curprev 10:0110:01, 8 August 2018imported>Stashbot 218,380 bytes +192 zhuyifei1999_: building & publishing toollabs-webservice 0.40 deb, and all Docker images T156626 T148872 T158244

6 August 2018

1 August 2018

  • curprev 14:3114:31, 1 August 2018imported>Stashbot 218,090 bytes +145 andrewbogott: temporarily depooling tools-exec-1409, 1410, 1414, 1419, 1427, 1428 to try to give labvirt1009 a break

30 July 2018

  • curprev 20:3320:33, 30 July 2018imported>Stashbot 217,945 bytes +186 bd808: Started rebuilding all Kubernetes Docker images to pick up latest apt updates

27 July 2018

  • curprev 04:5204:52, 27 July 2018imported>Stashbot 217,759 bytes +108 zhuyifei1999_: rebuilding python/base docker container T190274

25 July 2018

18 July 2018

  • curprev 13:2413:24, 18 July 2018imported>Stashbot 217,476 bytes +506 arturo: upgrading packages from `stretch-wikimedia` T199905

30 June 2018

  • curprev 18:1518:15, 30 June 2018imported>Stashbot 216,970 bytes +486 chicocvenancio: pushed new config to PAWS to fix dumps nfs mountpoint

29 June 2018

  • curprev 17:4117:41, 29 June 2018imported>Stashbot 216,484 bytes +431 bd808: Rescheduling continuous jobs away from tools-exec-1408 where load is high

28 June 2018

  • curprev 19:5019:50, 28 June 2018imported>Stashbot 216,053 bytes +640 chasemp: tools-clushmaster-01:~$ clush -w @all 'sudo umount -fl /mnt/nfs/dumps-labstore1006.wikimedia.org'

21 June 2018

  • curprev 13:1813:18, 21 June 2018imported>Stashbot 215,413 bytes +109 chasemp: tools-bastion-03:~# bash -x /data/project/paws/paws-userhomes-hack.bash

20 June 2018

  • curprev 15:0915:09, 20 June 2018imported>Stashbot 215,304 bytes +138 bd808: Killed orphan processes on webgrid nodes (T182070); most owned by jembot and croptool

14 June 2018

  • curprev 14:2014:20, 14 June 2018imported>Stashbot 215,166 bytes +102 chasemp: timeout 180s bash -x /data/project/paws/paws-userhomes-hack.bash

11 June 2018

  • curprev 10:1110:11, 11 June 2018imported>Stashbot 215,064 bytes +279 arturo: T196137 `aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo wc -l /var/log/exim4/paniclog 2>/dev/null | grep -v ^0 && sudo rm -rf /var/log/exim4/paniclog && sudo service prometheus-node-exporter restart || true'`

8 June 2018

  • curprev 07:4607:46, 8 June 2018imported>Stashbot 214,785 bytes +172 arturo: T196137 more rootspam today, restarting again `prometheus-node-exporter` and force rotating exim4 paniclog in 12 nodes

7 June 2018

  • curprev 11:0111:01, 7 June 2018imported>Stashbot 214,613 bytes +218 arturo: T196137 force rotate all exim panilog files to avoid rootspam `aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo logrotate /etc/logrotate.d/exim4-paniclog -f -v'`

6 June 2018

  • curprev 22:0022:00, 6 June 2018imported>Stashbot 214,395 bytes +620 bd808: Scripting a restart of webservice for tools that are still in CrashLoopBackOff state after 2nd attempt (T196589)

5 June 2018

  • curprev 18:0218:02, 5 June 2018imported>Stashbot 213,775 bytes +369 bd808: Forced puppet run on tools-bastion-03 to re-enable logins by dubenben (T196486)

4 June 2018

  • curprev 10:2810:28, 4 June 2018imported>Stashbot 213,406 bytes +102 arturo: T196006 installing sqlite3 package in exec nodes

3 June 2018

  • curprev 10:1910:19, 3 June 2018imported>Stashbot 213,304 bytes +211 zhuyifei1999_: Grid is full. qdel'ed all jobs belonging to tools.dibot except lighttpd, and tools.mbh that has a job name starting 'comm_delin', 'delfilexcl' T195834

31 May 2018

  • curprev 11:3111:31, 31 May 2018imported>Stashbot 213,093 bytes +237 zhuyifei1999_: building & pushing python/web docker image T174769

30 May 2018

  • curprev 10:5210:52, 30 May 2018imported>Stashbot 212,856 bytes +425 zhuyifei1999_: undid both changes to tools-bastion-05

28 May 2018

  • curprev 12:0912:09, 28 May 2018imported>Stashbot 212,431 bytes +250 arturo: T194665 adding mono packages to apt.wikimedia.org for jessie-wikimedia and stretch-wikimedia

25 May 2018

  • curprev 05:3105:31, 25 May 2018imported>Stashbot 212,181 bytes +181 zhuyifei1999_: Edit /data/project/.system/gridengine/default/common/sge_request, h_vmem 256M -> 512M, release precise -> trusty T195558

22 May 2018

  • curprev 11:5311:53, 22 May 2018imported>Stashbot 212,000 bytes +157 arturo: running puppet to deploy https://gerrit.wikimedia.org/r/#/c/433996/ for T194665 (mono framework update)

18 May 2018

  • curprev 16:3616:36, 18 May 2018imported>Stashbot 211,843 bytes +77 bd808: Restarted bigbrother on tools-services-02

16 May 2018

  • curprev 21:1721:17, 16 May 2018imported>Stashbot 211,766 bytes +104 zhuyifei1999_: maintain-kubeusers on stuck in infinite sleeps of 10 seconds

15 May 2018

  • curprev 04:2804:28, 15 May 2018imported>Stashbot 211,662 bytes +371 andrewbogott: depooling, rebooting, re-pooling tools-exec-1414. It's hanging for unknown reasons.

12 May 2018

  • curprev 10:0910:09, 12 May 2018imported>Stashbot 211,291 bytes +129 Hauskatze: tools.quentinv57-tools@tools-bastion-02:~$ webservice stop | T194343

11 May 2018

  • curprev 14:3414:34, 11 May 2018imported>Stashbot 211,162 bytes +406 andrewbogott: repooling labvirt1001 tools instances

10 May 2018

  • curprev 18:5518:55, 10 May 2018imported>Stashbot 210,756 bytes +114 andrewbogott: depooling, rebooting, repooling tools-exec-1401 to test a kernel update

9 May 2018

7 May 2018

  • curprev 21:0221:02, 7 May 2018imported>Stashbot 210,572 bytes +185 zhuyifei1999_: re-building all docker images T190893
  • curprev 00:2500:25, 7 May 2018imported>Stashbot 210,387 bytes +160 zhuyifei1999_: `renice -n 15 -p 28865` (`tar cvzf` of `tools.giftbot`) on tools-bastion-02, been hogging the NFS IO for a few hours

5 May 2018

  • curprev 23:3723:37, 5 May 2018imported>Stashbot 210,227 bytes +126 zhuyifei1999_: regenerate k8s creds for tools.zhuyifei1999-test because I messed up while testing

3 May 2018

  • curprev 14:4814:48, 3 May 2018imported>Stashbot 210,101 bytes +146 arturo: uploaded a new ruby docker image to the registry with the libmysqlclient-dev package T192566

1 May 2018

  • curprev 14:0514:05, 1 May 2018imported>Stashbot 209,955 bytes +114 andrewbogott: moving tools-webgrid-lighttpd-1406 to labvirt1016 (routine rebalancing)

27 April 2018

  • curprev 18:2618:26, 27 April 2018imported>Stashbot 209,841 bytes +267 zhuyifei1999_: `$ write` doesn't seem to be able to write to their tmux tty, so echoed into their pts directly: `# echo -e '\n\n[...]\n' > /dev/pts/81`

23 April 2018

  • curprev 14:4114:41, 23 April 2018imported>Stashbot 209,574 bytes +170 zhuyifei1999_: `chown tools.pywikibot:tools.pywikibot /shared/pywikipedia/` Prior owner: tools.russbot:project-tools T192732

22 April 2018

  • curprev 13:0713:07, 22 April 2018imported>Stashbot 209,404 bytes +219 bd808: Kill orphan php-cgi processes across the job grid via clush -w @exec -w @webgrid -b 'ps axwo user:20,ppid,pid,cmd | grep -E " 1 " | grep php-cgi | xargs sudo kill -9'`

15 April 2018

  • curprev 17:5117:51, 15 April 2018imported>Stashbot 209,185 bytes +215 zhuyifei1999_: forced puppet puns across tools-elastic-0[1-3] T192224

11 April 2018

  • curprev 13:2513:25, 11 April 2018imported>Stashbot 208,970 bytes +103 chasemp: cleanup exim frozen messages in an effort to aleve queue pressure

6 April 2018

  • curprev 16:3016:30, 6 April 2018imported>Stashbot 208,867 bytes +334 chicocvenancio: killed job in bastion, tools.gpy affected

5 April 2018

29 March 2018

  • curprev 20:0920:09, 29 March 2018imported>Stashbot 208,457 bytes +230 chicocvenancio: killed interactive processes in tools-bastion-03

28 March 2018

  • curprev 13:0613:06, 28 March 2018imported>Stashbot 208,227 bytes +133 zhuyifei1999_: SIGTERM PID 30633 on tools-bastion-03 (tool 3d2commons's celery). Please run this on grid

26 March 2018

  • curprev 21:3521:35, 26 March 2018imported>Stashbot 208,094 bytes +108 bd808: clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete'

23 March 2018

  • curprev 23:2623:26, 23 March 2018imported>Stashbot 207,986 bytes +207 bd808: clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete'

22 March 2018

  • curprev 22:0422:04, 22 March 2018imported>Stashbot 207,779 bytes +371 bd808: Forced puppet run on tools-proxy-02 for T130748

21 March 2018

  • curprev 17:5017:50, 21 March 2018imported>Stashbot 207,408 bytes +230 bd808: Cleaned up stale /project/.system/bigbrother.scoreboard.* files from labstore1004

20 March 2018

  • curprev 08:2808:28, 20 March 2018imported>Stashbot 207,178 bytes +163 zhuyifei1999_: unmount dumps & remount on tools-bastion-02 (can someone clush this?) T189018 T190126

19 March 2018

  • curprev 11:0211:02, 19 March 2018imported>Stashbot 207,015 bytes +131 arturo: reboot tools-exec-1408, to balance load. Server is unresponsive due to high load by some tools

16 March 2018

  • curprev 22:4422:44, 16 March 2018imported>Stashbot 206,884 bytes +277 zhuyifei1999_: suspended process 22825 (BotOrderOfChapters.exe) on tools-bastion-03. Threads continuously going to D-state & R-state. Also sent message via $ write on pts/10

15 March 2018

  • curprev 16:5616:56, 15 March 2018imported>Stashbot 206,607 bytes +122 zhuyifei1999_: granted elasticsearch credentials to tools.denkmalbot T185624

14 March 2018

  • curprev 20:5720:57, 14 March 2018imported>Stashbot 206,485 bytes +503 bd808: Upgrading elasticsearch on tools-elastic-01 (T181531)

12 March 2018

  • curprev 20:0920:09, 12 March 2018imported>Stashbot 205,982 bytes +976 madhuvishy: Run clush -w @all -b 'sudo umount /mnt/nfs/labstore1003-scratch && sudo mount -a' to remount scratch across all of tools

8 March 2018

  • curprev 16:0516:05, 8 March 2018imported>Stashbot 205,006 bytes +257 chasemp: tools-clushmaster-01:~$ clush -g all 'sudo puppet agent --test'

7 March 2018

  • curprev 20:4220:42, 7 March 2018imported>Stashbot 204,749 bytes +360 chicocvenancio: killed io intensive recursive zip of huge folder

6 March 2018

  • curprev 16:1516:15, 6 March 2018imported>Stashbot 204,389 bytes +1,691 madhuvishy: Reboot tools-docker-registry-02 T189018

5 March 2018

  • curprev 18:5618:56, 5 March 2018imported>Stashbot 202,698 bytes +695 zhuyifei1999_: also published jobutils_1.30_all.deb

2 March 2018

  • curprev 13:4113:41, 2 March 2018imported>Stashbot 202,003 bytes +115 arturo: doing some testing with puppet classes in tools-package-builder-01 via horizon

1 March 2018

  • curprev 13:2713:27, 1 March 2018imported>Stashbot 201,888 bytes +86 arturo: deploy https://gerrit.wikimedia.org/r/#/c/415057/

27 February 2018

26 February 2018

  • curprev 19:1819:18, 26 February 2018imported>Stashbot 201,608 bytes +221 chasemp: tools-clushmaster-01:~$ clush -w @all "sudo puppet agent --test"

25 February 2018

  • curprev 19:0419:04, 25 February 2018imported>Stashbot 201,387 bytes +117 chicocvenancio: killed jobs in tools-bastion-03, wrote notice to tools owners' terminals

23 February 2018

22 February 2018

21 February 2018

  • curprev 19:0219:02, 21 February 2018imported>Stashbot 200,840 bytes +1,143 bstorm_: disabled puppet on tools-static-* pending change 413197

20 February 2018

19 February 2018

16 February 2018

  • curprev 18:2118:21, 16 February 2018imported>Stashbot 199,148 bytes +937 arturo: upgrading tools-proxy-01 and tools-paws-master-01, same as others

15 February 2018

  • curprev 13:5413:54, 15 February 2018imported>Stashbot 198,211 bytes +672 arturo: cleanup ferm (deinstall) in tools-services-01 for T187435

14 February 2018

  • curprev 13:0913:09, 14 February 2018imported>Stashbot 197,539 bytes +236 arturo: the reboot was OK, the server seems working and kubectl sees all the pods running in the deployment (T187315)

11 February 2018

  • curprev 01:2801:28, 11 February 2018imported>Stashbot 197,303 bytes +367 zhuyifei1999_: `# find /home/ -maxdepth 1 -perm -o+w \! -uid 0 -exec chmod -v o-w {} \;` Affected: only /home/tr8dr, mode 0777 -> 0775

9 February 2018

  • curprev 10:3510:35, 9 February 2018imported>Stashbot 196,936 bytes +989 arturo: deploy https://gerrit.wikimedia.org/r/#/c/409226/ T179343 T182562 T186846

8 February 2018

  • curprev 18:3818:38, 8 February 2018imported>Stashbot 195,947 bytes +1,291 arturo: aborrero@tools-k8s-master-01:~$ sudo kubectl uncordon tools-worker-1002.tools.eqiad.wmflabs

6 February 2018

  • curprev 13:1513:15, 6 February 2018imported>Stashbot 194,656 bytes +302 arturo: deploy https://gerrit.wikimedia.org/r/#/c/408529/ to tools-services-01

5 February 2018

  • curprev 17:5817:58, 5 February 2018imported>Stashbot 194,354 bytes +325 arturo: publishing/unpublishing trusty-tools repo in tools-services-01 to address T186539

3 February 2018

  • curprev 01:0401:04, 3 February 2018imported>Stashbot 194,029 bytes +129 chicocvenancio: killed io intensive process in bastion-03 "vltools python3 ./broken_ref_anchors.py"

31 January 2018

29 January 2018

28 January 2018

  • curprev 22:4922:49, 28 January 2018imported>Stashbot 193,665 bytes +165 chicocvenancio: killed compromised session generating miner processes

27 January 2018

  • curprev 00:5500:55, 27 January 2018imported>Stashbot 193,500 bytes +209 arturo: at tools-static-11 the kernel OOM killer stopped git gc at about 20% :-(

25 January 2018

  • curprev 23:4723:47, 25 January 2018imported>Stashbot 193,291 bytes +422 arturo: fix last deprecation warnings in tools-elastic-03, tools-elastic-02, tools-proxy-01 and tools-proxy-02 by replacing by hand configtimeout with http_configtimeout in /etc/puppet/puppet.conf

23 January 2018

22 January 2018

  • curprev 18:3218:32, 22 January 2018imported>Stashbot 192,662 bytes +538 arturo: T181948 T185314 deploying jobutils and misctools v1.28 in the cluster

19 January 2018

18 January 2018

  • curprev 16:1116:11, 18 January 2018imported>Stashbot 191,768 bytes +877 arturo: aborrero@tools-clushmaster-01:~$ sudo aptitude purge vblade vblade-persist runit (for something similar to T182781)

17 January 2018

  • curprev 18:4818:48, 17 January 2018imported>Stashbot 190,891 bytes +692 arturo: aborrero@tools-clushmaster-01:~$ clush -w @all 'apt-show-versions | grep upgradeable | grep trusty-wikimedia' | tee pending-upgrades-report-trusty-wikimedia.txt

16 January 2018

  • curprev 22:0122:01, 16 January 2018imported>Stashbot 190,199 bytes +3,082 chasemp: qstat -explain E -xml | grep 'name' | sed 's/<name>//' | sed 's/<\/name>//' | xargs qmod -cq

11 January 2018

  • curprev 20:3320:33, 11 January 2018imported>Stashbot 187,117 bytes +848 andrewbogott: repooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421

10 January 2018

  • curprev 15:1415:14, 10 January 2018imported>Stashbot 186,269 bytes +1,549 chasemp: tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test"

9 January 2018

  • curprev 23:2123:21, 9 January 2018imported>Stashbot 184,720 bytes +2,117 yuvipanda: paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading

8 January 2018

  • curprev 20:3420:34, 8 January 2018imported>Stashbot 182,603 bytes +219 madhuvishy: Restart kube services and uncordon tools-worker-1001

6 January 2018

  • curprev 00:3500:35, 6 January 2018imported>Stashbot 182,384 bytes +1,399 madhuvishy: Run `clush -w @paws-worker -b 'sudo iptables -L FORWARD'`

4 January 2018

  • curprev 17:2417:24, 4 January 2018imported>Stashbot 180,985 bytes +120 andrewbogott: rebooting tools-paws-worker-1019 to verify repair of T184018

3 January 2018

31 December 2017

  • curprev 02:0102:01, 31 December 2017imported>Stashbot 180,671 bytes +102 bd808: Killed some pwb.py and qacct processes running on tools-bastion-03

21 December 2017

  • curprev 17:5717:57, 21 December 2017imported>Stashbot 180,569 bytes +199 bd808: PAWS: deleted hub-deployment pod stuck in crashloopbackoff

19 December 2017

18 December 2017

  • curprev 12:0412:04, 18 December 2017imported>Stashbot 180,173 bytes +621 arturo: it seems jupyterhub tries to use a database which doesn't exists: [E 2017-12-18 11:59:49.896 JupyterHub app:904] Failed to connect to db: sqlite:///jupyterhub.sqlite

15 December 2017

14 December 2017

  • curprev 16:5816:58, 14 December 2017imported>Stashbot 179,263 bytes +188 arturo: running clush -w @all 'sudo puppet agent --test' from tools-clushmaster-01.eqiad.wmflabs due to https://gerrit.wikimedia.org/r/#/c/394572/ being merged

13 December 2017

11 December 2017

  • curprev 19:3219:32, 11 December 2017imported>Stashbot 178,492 bytes +261 bd808: git gc on tools-static-11; --aggressive was killed by system (T182604)

1 December 2017

  • curprev 15:3315:33, 1 December 2017imported>Stashbot 178,231 bytes +218 chasemp: put the weird mess of untracked files on tools puppetmaster into stash to see what breaks as they should not be there?

30 November 2017

20 November 2017

  • curprev 20:3420:34, 20 November 2017imported>Stashbot 177,867 bytes +94 chasemp: backup crons tools-cron-01:/var/spool/cron# cp -Rp crontabs/ /root/20112017/
  • curprev 00:5200:52, 20 November 2017imported>Stashbot 177,773 bytes +128 andrewbogott: cherry-picking https://gerrit.wikimedia.org/r/#/c/392172/ onto the tools puppetmaster

17 November 2017

  • curprev 21:3321:33, 17 November 2017imported>Stashbot 177,645 bytes +232 valhallasw`cloud: also g-w'ed those files, and sent emails to all the affected users

16 November 2017

  • curprev 17:4017:40, 16 November 2017imported>Stashbot 177,413 bytes +291 chasemp: tools-clushmaster-01:~$ clush -w @all 'sudo puppet agent --enable && sudo puppet agent --test && sudo unattended-upgrades -d'

15 November 2017

7 November 2017

  • curprev 01:2101:21, 7 November 2017imported>Stashbot 176,922 bytes +290 bd808: Removed all non-directory files from /home (via labstore1004 direct access)

5 November 2017

  • curprev 23:4823:48, 5 November 2017imported>Stashbot 176,632 bytes +391 bd808: Cleaned up 2 huge /tmp files left by tools.croptool (~6.5G)

3 November 2017

2 November 2017

1 November 2017

  • curprev 07:1107:11, 1 November 2017imported>Stashbot 176,084 bytes +213 madhuvishy: Clear nscd cache across all projects post labsdb dns switchover T179464

31 October 2017

  • curprev 16:5016:50, 31 October 2017imported>Stashbot 175,871 bytes +93 bd808: tools-bastion-03 (tools-login, login.tools) is overloaded

30 October 2017

  • curprev 17:3517:35, 30 October 2017imported>Stashbot 175,778 bytes +485 madhuvishy: Clear dns caches across tools hosts `sudo nscd -i hosts`

24 October 2017

  • curprev 18:0918:09, 24 October 2017imported>Stashbot 175,293 bytes +201 madhuvishy: Disable puppet on tools-package-builder-01 temporarily (T178920)

23 October 2017

  • curprev 14:4914:49, 23 October 2017imported>Stashbot 175,092 bytes +92 chasemp: wall message and scheduled reboot in 5m for bastion-03

18 October 2017

  • curprev 21:3621:36, 18 October 2017imported>Stashbot 175,000 bytes +275 chasemp: stop basebot -- it is going crazy and spamming email w/ failing to log to error.log. Need to figure out how to notify but it's clearly in a failure loop.

12 October 2017

  • curprev 16:5716:57, 12 October 2017imported>Stashbot 174,725 bytes +163 bd808: Rebuilding all Kubernetes Docker images to include toollabs-webservice 0.38

6 October 2017

5 October 2017

  • curprev 15:4615:46, 5 October 2017imported>Stashbot 174,353 bytes +169 chasemp: tools-bastion-03 has tons of local tools running long lived NFS intensive processes. I'm rebooting rather than playing whackamole.

3 October 2017

  • curprev 19:3019:30, 3 October 2017imported>Stashbot 174,184 bytes +103 bd808: `kubectl --namespace=prod delete pod --all` on tools-paws-master-01

1 October 2017

  • curprev 21:4621:46, 1 October 2017imported>Stashbot 174,081 bytes +108 madhuvishy: Cold migrating tools-clushmaster-01 from labvirt1015 to labvirt1017

29 September 2017

25 September 2017

  • curprev 15:1415:14, 25 September 2017imported>Stashbot 173,885 bytes +224 andrewbogott: rebooting tools-paws-worker-1006 since I can't access it

20 September 2017

  • curprev 16:5216:52, 20 September 2017imported>Stashbot 173,661 bytes +922 madhuvishy: apt-get install --only-upgrade apache2; service apache2 restart on tools-puppetmaster-01

13 September 2017

31 August 2017

  • curprev 20:3320:33, 31 August 2017imported>Stashbot 172,342 bytes +503 madhuvishy: Updated certs and ran puppet, restarted nginx on tools-proxy-* and tools-static-* (T174611)

24 August 2017

22 August 2017

  • curprev 19:2019:20, 22 August 2017imported>Stashbot 171,699 bytes +108 andrewbogott: deleted tools-puppetmaster-02, it was replaced a month ago by -01

12 August 2017

11 August 2017

10 August 2017

  • curprev 14:5914:59, 10 August 2017imported>Stashbot 171,463 bytes +117 chasemp: 'become stimmberechtigung && restart' && 'become intersect-contribs && restart'

9 August 2017

3 August 2017

  • curprev 00:4700:47, 3 August 2017imported>Stashbot 171,272 bytes +244 bd808: tools-bastion-03 not usably responsive to interactive commands; will reboot

31 July 2017

  • curprev 15:2815:28, 31 July 2017imported>Stashbot 171,028 bytes +82 chasemp: remove python-keystoneclient from bastion-03

27 July 2017

  • curprev 23:2723:27, 27 July 2017imported>Stashbot 170,946 bytes +353 bd808: Killed python procs owned by sdesabbata on tools-login that were stealing all cpu/io

26 July 2017

  • curprev 22:3322:33, 26 July 2017imported>Stashbot 170,593 bytes +95 chasemp: hotpatching an hiera value on tools master to see effects

20 July 2017

  • curprev 19:4819:48, 20 July 2017imported>Stashbot 170,498 bytes +694 bd808: Clearing all Eqw state jobs in all queues with: qstat -u '*' | grep Eqw | awk '{print $1;}' | xargs -L1 qmod -cj

19 July 2017

  • curprev 23:5223:52, 19 July 2017imported>Stashbot 169,804 bytes +302 bd808: Restarted cron on tools-cron-01; toolschecker job showing user not found errors

18 July 2017

  • curprev 19:5119:51, 18 July 2017imported>Stashbot 169,502 bytes +112 andrewbogott: enabling puppet on tools-proxy-02. I don't know why it was disabled.

17 July 2017

  • curprev 01:4301:43, 17 July 2017imported>Stashbot 169,390 bytes +182 bd808: Uncordoned tools-worker-1020 after it deleted pods with local storage that were filling the entire disk

13 July 2017

12 July 2017

7 July 2017

  • curprev 18:2618:26, 7 July 2017imported>Stashbot 168,710 bytes +88 bd808: Forced puppet runs on tools-redis-* for security fix

3 July 2017

  • curprev 04:2604:26, 3 July 2017imported>Stashbot 168,622 bytes +224 bd808: cdnjs on tools-static-10 is up to date

1 July 2017

  • curprev 19:4019:40, 1 July 2017imported>Stashbot 168,398 bytes +175 bd808: Disabled puppet on tools-k8s-master-01 to try and fix maintain-kubeusers

30 June 2017

  • curprev 01:3301:33, 30 June 2017imported>Stashbot 168,223 bytes +144 chasemp: time for i in `cat tools-hosts`; do ssh -i ~/.ssh/labs_root_id_rsa root@$i.eqiad.wmflabs 'hostname -f; uptime; tc-setup'; done
  • curprev 01:2901:29, 30 June 2017imported>Stashbot 168,079 bytes +1,063 andrewbogott: rebooting tools-cron-01

27 June 2017

  • curprev 21:3221:32, 27 June 2017imported>Stashbot 167,016 bytes +128 andrewbogott: moving all tools nodes to new puppetmaster, tools-puppetmaster-01.tools.eqiad.wmflabs

25 June 2017

24 June 2017

  • curprev 16:0116:01, 24 June 2017imported>Stashbot 166,810 bytes +133 bd808: Created and provisioned elasticsearch password for tools.wmde-uca-test (T167971)

23 June 2017

  • curprev 20:2020:20, 23 June 2017imported>Stashbot 166,677 bytes +175 bd808: Reindexing various elasticsearch indexes created before we upgraded to v2.x

22 June 2017

  • curprev 17:0317:03, 22 June 2017imported>Stashbot 166,502 bytes +287 bd808: Rolled back attempt at Elasticsearch upgrade. Indices need to be rebuilt with 2.x before 5.x can be installed. T164842
  • curprev 00:1200:12, 22 June 2017imported>Stashbot 166,215 bytes +3,075 bd808: Set ownership and permissions on $HOME/.kube for all tools (T165875)

14 June 2017

  • curprev 22:0922:09, 14 June 2017imported>Stashbot 163,140 bytes +83 bd808: Restarted apache2 proc on tools-puppetmaster-02

8 June 2017

  • curprev 18:1418:14, 8 June 2017imported>Stashbot 163,057 bytes +369 madhuvishy: Also delete from /tmp on tools-webgrid-lighttpd-1411 xvfb-run.*, calibre_* and ws-*.epub

7 June 2017

  • curprev 19:0519:05, 7 June 2017imported>Stashbot 162,688 bytes +94 madhuvishy: Killed scp job run by user torin8 on tools-bastion-02

6 June 2017

  • curprev 20:3020:30, 6 June 2017imported>Stashbot 162,594 bytes +142 chasemp: rebooting tools-bastion-02 as unresponsive (up 76 days and lots of seemingly left behind things running)

5 June 2017

  • curprev 23:4423:44, 5 June 2017imported>Stashbot 162,452 bytes +390 bd808: Deleted tools.iabot crontab that somehow got locally installed on tools-exec-1412 on 2017-05-24T20:55Z

1 June 2017

  • curprev 15:1515:15, 1 June 2017imported>Stashbot 162,062 bytes +124 andrewbogott: depooling/rebooting/repooling tools-exec-1403 as part of old kernel-purge testing

31 May 2017

  • curprev 19:2919:29, 31 May 2017imported>Stashbot 161,938 bytes +668 bd808: Rebuiding all Docker images to pick up toollabs-webservice v0.37 (T163355)

30 May 2017

  • curprev 22:3222:32, 30 May 2017imported>Stashbot 161,270 bytes +1,004 andrewbogott: migrating tools-webgrid-lighttpd-1406, tools-exec-1410 from labvirt1006 to labvirt1009 to balance cpu usage

26 May 2017

  • curprev 20:3220:32, 26 May 2017imported>Stashbot 160,266 bytes +160,266 bd808: Added tools-webgrid-lighttpd-14{19,2[0-8]} as submit hosts
(newest | oldest) View ( | older 500) (20 | 50 | 100 | 250 | 500)