You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Tools/SAL: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(taavi: depool tools-k8s-worker-42 for experiments)
imported>Stashbot
(wm-bot2: removing grid node tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko)
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== 2022-08-05 ===
* 15:08 wm-bot2: removing grid node tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:05 wm-bot2: removing grid node tools-sgeexec-10-12.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:00 wm-bot2: created node tools-sgewebgen-10-3.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
=== 2022-08-03 ===
* 15:51 dhinus: recreated jobs-api pods to pick up new ConfigMap
* 15:02 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c47ac41}}) - cookbook ran by fran@MacBook-Pro.station
=== 2022-07-20 ===
* 19:31 taavi: reboot toolserver-proxy-01 to free up disk space probably held by stale file handles
* 08:06 wm-bot2: removing grid node tools-sgeexec-10-6.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
=== 2022-07-19 ===
* 17:53 wm-bot2: created node tools-sgeexec-10-21.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
* 17:00 wm-bot2: removing grid node tools-sgeexec-10-3.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 16:58 wm-bot2: removing grid node tools-sgeexec-10-4.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 16:24 wm-bot2: created node tools-sgeexec-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
* 15:59 taavi: tag current maintain-kubernetes :beta image as: :latest
=== 2022-07-17 ===
* 15:52 wm-bot2: removing grid node tools-sgeexec-10-10.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:43 wm-bot2: removing grid node tools-sgeexec-10-2.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 13:26 wm-bot2: created node tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
=== 2022-07-14 ===
* 13:48 taavi: rebooting tools-sgeexec-10-2
=== 2022-07-13 ===
* 12:09 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus
=== 2022-07-11 ===
* 16:06 wm-bot2: Increased quotas by <nowiki>{</nowiki>self.increases<nowiki>}</nowiki> ([[phab:T312692|T312692]]) - cookbook ran by nskaggs@x1carbon
=== 2022-07-07 ===
* 07:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus
=== 2022-06-28 ===
* 17:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master ([[phab:T311538|T311538]]) - cookbook ran by dcaro@vulcanus
* 15:51 taavi: add 4096G cinder quota [[phab:T311509|T311509]]
=== 2022-06-27 ===
* 18:14 taavi: restart calico, appears to have got stuck after the ca replacement operation
* 18:02 taavi: switchover active cron server to tools-sgecron-2 [[phab:T284767|T284767]]
* 17:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0915.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:52 wm-bot2: removing grid node tools-sgewebgrid-generic-0902.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:49 wm-bot2: removing grid node tools-sgeexec-0942.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:15 taavi: [[phab:T311412|T311412]] updating ca used by k8s-apiserver->etcd communication, breakage may happen
* 14:58 taavi: renew puppet ca cert and certificate for tools-puppetmaster-02 [[phab:T311412|T311412]]
* 14:50 taavi: backup /var/lib/puppet/server to /root/puppet-ca-backup-2022-06-27.tar.gz on tools-puppetmaster-02 before we do anything else to it [[phab:T311412|T311412]]
=== 2022-06-23 ===
* 17:51 wm-bot2: removing grid node tools-sgeexec-0941.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:49 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0916.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:46 wm-bot2: removing grid node tools-sgewebgrid-generic-0901.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:32 wm-bot2: removing grid node tools-sgeexec-0939.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:30 wm-bot2: removing grid node tools-sgeexec-0938.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:27 wm-bot2: removing grid node tools-sgeexec-0937.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:22 wm-bot2: removing grid node tools-sgeexec-0936.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:19 wm-bot2: removing grid node tools-sgeexec-0935.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:17 wm-bot2: removing grid node tools-sgeexec-0934.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:14 wm-bot2: removing grid node tools-sgeexec-0933.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:11 wm-bot2: removing grid node tools-sgeexec-0932.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 17:09 wm-bot2: removing grid node tools-sgeexec-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:30 wm-bot2: removing grid node tools-sgeexec-0947.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 13:59 taavi: removing remaining continuous jobs from the stretch grid [[phab:T277653|T277653]]
=== 2022-06-22 ===
* 15:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0917.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:51 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0918.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:47 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0919.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:45 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
=== 2022-06-21 ===
* 15:23 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:20 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:18 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0913.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
* 15:07 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0912.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
=== 2022-06-03 ===
* 20:07 wm-bot2: created node tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 19:51 balloons: Scaling webservice nodes to 20, using new 8G swap flavor [[phab:T309821|T309821]]
* 19:35 wm-bot2: created node tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 19:03 wm-bot2: created node tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 19:01 wm-bot2: created node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 19:00 balloons: depooled old nodes, bringing entirely new grid of nodes online [[phab:T309821|T309821]]
* 18:22 wm-bot2: created node tools-sgeweblight-10-17.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 17:54 wm-bot2: created node tools-sgeweblight-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 17:52 wm-bot2: created node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 16:59 andrewbogott: building a bunch of new lighttpd nodes (beginning with tools-sgeweblight-10-12) using a flavor with more swap space
* 16:56 wm-bot2: created node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
* 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor [[phab:T309821|T309821]]
* 15:50 balloons: temp add 1.0G swap to sgeweblight hosts [[phab:T309821|T309821]]
* 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor t309821
* 15:49 balloons: temp add 1.0G swap to sgeweblight hosts t309821
* 13:25 bd808: Upgrading fleet to tools-webservice 0.86 ([[phab:T309821|T309821]])
* 13:20 bd808: publish tools-webservice 0.86 ([[phab:T309821|T309821]])
* 12:46 taavi: start webservicemonitor on tools-sgecron-01 [[phab:T309821|T309821]]
* 10:36 taavi: draining each sgeweblight node one by one, and removing the jobs stuck in 'deleting' too
* 05:05 taavi: removing duplicate (there should be only one per tool) web service jobs from the grid [[phab:T309821|T309821]]
* 04:52 taavi: revert bd808's changes to profile::toolforge::active_proxy_host
* 03:21 bd808: Cleared queue error states after deploying new toolforge-webservice package ([[phab:T309821|T309821]])
* 03:10 bd808: publish tools-webservice 0.85 with hack for [[phab:T309821|T309821]]
=== 2022-06-02 ===
* 22:26 bd808: Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler
* 21:56 bd808: Removed legacy "active_proxy_host" hiera setting
* 21:55 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key
* 21:41 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key
* 21:23 wm-bot2: created node tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
* 12:42 wm-bot2: rebooting stretch exec grid workers - cookbook ran by taavi@runko
* 12:13 wm-bot2: created node tools-sgeweblight-10-7.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
* 12:03 dcaro: refresh prometheus certs ([[phab:T308402|T308402]])
* 11:47 dcaro: refresh registry-admission-controller certs ([[phab:T308402|T308402]])
* 11:42 dcaro: refresh ingress-admission-controller certs ([[phab:T308402|T308402]])
* 11:36 dcaro: refresh volume-admission-controller certs ([[phab:T308402|T308402]])
* 11:24 wm-bot2: created node tools-sgeweblight-10-6.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
* 11:17 taavi: publish jobutils 1.44 that updates the grid default from stretch to buster [[phab:T277653|T277653]]
* 10:16 taavi: publish tools-webservice 0.84 that updates the grid default from stretch to buster [[phab:T277653|T277653]]
* 09:54 wm-bot2: created node tools-sgeexec-10-14.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
=== 2022-06-01 ===
* 11:18 taavi: depool and remove tools-sgeexec-09[07-14]
=== 2022-05-31 ===
* 16:51 taavi: delete tools-sgeexec-0904 for [[phab:T309525|T309525]] experimentation
=== 2022-05-30 ===
* 08:24 taavi: depool tools-sgeexec-[0901-0909] (7 nodes total) [[phab:T277653|T277653]]
=== 2022-05-26 ===
* 15:39 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T309146|T309146]]) - cookbook ran by taavi@runko
=== 2022-05-22 ===
* 17:04 taavi: failover tools-redis to the updated cluster [[phab:T278541|T278541]]
* 16:42 wm-bot2: removing grid node tools-sgeexec-0940.tools.eqiad1.wikimedia.cloud ([[phab:T308982|T308982]]) - cookbook ran by taavi@runko
=== 2022-05-16 ===
* 14:02 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|7037eca}}) - cookbook ran by taavi@runko
=== 2022-05-14 ===
* 10:47 taavi: hard reboot unresponsible tools-sgeexec-0940
=== 2022-05-12 ===
* 12:36 taavi: re-enable CronJobControllerV2 [[phab:T308205|T308205]]
* 09:28 taavi: deploy jobs-api update [[phab:T308204|T308204]]
* 09:15 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T308204|T308204]]) - cookbook ran by taavi@runko
=== 2022-05-10 ===
=== 2022-05-10 ===
* 15:18 taavi: depool tools-k8s-worker-42 for experiments
* 15:18 taavi: depool tools-k8s-worker-42 for experiments

Latest revision as of 15:08, 5 August 2022

2022-08-05

  • 15:08 wm-bot2: removing grid node tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:05 wm-bot2: removing grid node tools-sgeexec-10-12.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:00 wm-bot2: created node tools-sgewebgen-10-3.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko

2022-08-03

2022-07-20

  • 19:31 taavi: reboot toolserver-proxy-01 to free up disk space probably held by stale file handles
  • 08:06 wm-bot2: removing grid node tools-sgeexec-10-6.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko

2022-07-19

  • 17:53 wm-bot2: created node tools-sgeexec-10-21.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
  • 17:00 wm-bot2: removing grid node tools-sgeexec-10-3.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 16:58 wm-bot2: removing grid node tools-sgeexec-10-4.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 16:24 wm-bot2: created node tools-sgeexec-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
  • 15:59 taavi: tag current maintain-kubernetes :beta image as: :latest

2022-07-17

  • 15:52 wm-bot2: removing grid node tools-sgeexec-10-10.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:43 wm-bot2: removing grid node tools-sgeexec-10-2.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 13:26 wm-bot2: created node tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko

2022-07-14

  • 13:48 taavi: rebooting tools-sgeexec-10-2

2022-07-13

  • 12:09 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus

2022-07-11

  • 16:06 wm-bot2: Increased quotas by {self.increases} (T312692) - cookbook ran by nskaggs@x1carbon

2022-07-07

  • 07:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus

2022-06-28

  • 17:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master (T311538) - cookbook ran by dcaro@vulcanus
  • 15:51 taavi: add 4096G cinder quota T311509

2022-06-27

  • 18:14 taavi: restart calico, appears to have got stuck after the ca replacement operation
  • 18:02 taavi: switchover active cron server to tools-sgecron-2 T284767
  • 17:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0915.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:52 wm-bot2: removing grid node tools-sgewebgrid-generic-0902.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:49 wm-bot2: removing grid node tools-sgeexec-0942.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:15 taavi: T311412 updating ca used by k8s-apiserver->etcd communication, breakage may happen
  • 14:58 taavi: renew puppet ca cert and certificate for tools-puppetmaster-02 T311412
  • 14:50 taavi: backup /var/lib/puppet/server to /root/puppet-ca-backup-2022-06-27.tar.gz on tools-puppetmaster-02 before we do anything else to it T311412

2022-06-23

  • 17:51 wm-bot2: removing grid node tools-sgeexec-0941.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:49 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0916.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:46 wm-bot2: removing grid node tools-sgewebgrid-generic-0901.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:32 wm-bot2: removing grid node tools-sgeexec-0939.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:30 wm-bot2: removing grid node tools-sgeexec-0938.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:27 wm-bot2: removing grid node tools-sgeexec-0937.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:22 wm-bot2: removing grid node tools-sgeexec-0936.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:19 wm-bot2: removing grid node tools-sgeexec-0935.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:17 wm-bot2: removing grid node tools-sgeexec-0934.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:14 wm-bot2: removing grid node tools-sgeexec-0933.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:11 wm-bot2: removing grid node tools-sgeexec-0932.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 17:09 wm-bot2: removing grid node tools-sgeexec-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:30 wm-bot2: removing grid node tools-sgeexec-0947.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 13:59 taavi: removing remaining continuous jobs from the stretch grid T277653

2022-06-22

  • 15:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0917.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:51 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0918.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:47 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0919.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:45 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko

2022-06-21

  • 15:23 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:20 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:18 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0913.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko
  • 15:07 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0912.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko

2022-06-03

  • 20:07 wm-bot2: created node tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 19:51 balloons: Scaling webservice nodes to 20, using new 8G swap flavor T309821
  • 19:35 wm-bot2: created node tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 19:03 wm-bot2: created node tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 19:01 wm-bot2: created node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 19:00 balloons: depooled old nodes, bringing entirely new grid of nodes online T309821
  • 18:22 wm-bot2: created node tools-sgeweblight-10-17.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 17:54 wm-bot2: created node tools-sgeweblight-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 17:52 wm-bot2: created node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 16:59 andrewbogott: building a bunch of new lighttpd nodes (beginning with tools-sgeweblight-10-12) using a flavor with more swap space
  • 16:56 wm-bot2: created node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster
  • 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor T309821
  • 15:50 balloons: temp add 1.0G swap to sgeweblight hosts T309821
  • 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor t309821
  • 15:49 balloons: temp add 1.0G swap to sgeweblight hosts t309821
  • 13:25 bd808: Upgrading fleet to tools-webservice 0.86 (T309821)
  • 13:20 bd808: publish tools-webservice 0.86 (T309821)
  • 12:46 taavi: start webservicemonitor on tools-sgecron-01 T309821
  • 10:36 taavi: draining each sgeweblight node one by one, and removing the jobs stuck in 'deleting' too
  • 05:05 taavi: removing duplicate (there should be only one per tool) web service jobs from the grid T309821
  • 04:52 taavi: revert bd808's changes to profile::toolforge::active_proxy_host
  • 03:21 bd808: Cleared queue error states after deploying new toolforge-webservice package (T309821)
  • 03:10 bd808: publish tools-webservice 0.85 with hack for T309821

2022-06-02

  • 22:26 bd808: Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler
  • 21:56 bd808: Removed legacy "active_proxy_host" hiera setting
  • 21:55 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key
  • 21:41 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key
  • 21:23 wm-bot2: created node tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
  • 12:42 wm-bot2: rebooting stretch exec grid workers - cookbook ran by taavi@runko
  • 12:13 wm-bot2: created node tools-sgeweblight-10-7.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
  • 12:03 dcaro: refresh prometheus certs (T308402)
  • 11:47 dcaro: refresh registry-admission-controller certs (T308402)
  • 11:42 dcaro: refresh ingress-admission-controller certs (T308402)
  • 11:36 dcaro: refresh volume-admission-controller certs (T308402)
  • 11:24 wm-bot2: created node tools-sgeweblight-10-6.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko
  • 11:17 taavi: publish jobutils 1.44 that updates the grid default from stretch to buster T277653
  • 10:16 taavi: publish tools-webservice 0.84 that updates the grid default from stretch to buster T277653
  • 09:54 wm-bot2: created node tools-sgeexec-10-14.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko

2022-06-01

  • 11:18 taavi: depool and remove tools-sgeexec-09[07-14]

2022-05-31

  • 16:51 taavi: delete tools-sgeexec-0904 for T309525 experimentation

2022-05-30

  • 08:24 taavi: depool tools-sgeexec-[0901-0909] (7 nodes total) T277653

2022-05-26

2022-05-22

  • 17:04 taavi: failover tools-redis to the updated cluster T278541
  • 16:42 wm-bot2: removing grid node tools-sgeexec-0940.tools.eqiad1.wikimedia.cloud (T308982) - cookbook ran by taavi@runko

2022-05-16

2022-05-14

  • 10:47 taavi: hard reboot unresponsible tools-sgeexec-0940

2022-05-12

2022-05-10

  • 15:18 taavi: depool tools-k8s-worker-42 for experiments
  • 13:54 taavi: enable distro-wikimedia unattended upgrades T290494

2022-05-06

  • 19:46 bd808: Rebuilt toolforge-perl532-sssd-base & toolforge-perl532-sssd-web to add liblocale-codes-perl (T307812)

2022-05-05

  • 17:28 taavi: deploy tools-webservice 0.83 T307693

2022-05-03

  • 08:20 taavi: redis: start replication from the old cluster to the new one (T278541)

2022-05-02

  • 08:54 taavi: restart acme-chief.service T307333

2022-04-25

  • 14:56 bd808: Rebuilding all docker images to pick up toolforge-webservice v0.82 (T214343)
  • 14:46 bd808: Building toolforge-webservice v0.82

2022-04-23

  • 16:51 bd808: Built new perl532-sssd/{base,web} images and pushed to registry (T214343)

2022-04-20

2022-04-16

2022-04-12

  • 21:32 bd808: Added komla to Gerrit group 'toollabs-trusted' (T305986)
  • 21:27 bd808: Added komla to 'roots' sudoers policy (T305986)
  • 21:24 bd808: Add komla as projectadmin (T305986)

2022-04-10

  • 18:43 taavi: deleted `/tmp/dwl02.out-20210915` on tools-sgebastion-07 (not touched since september, taking up 1.3G of disk space)

2022-04-09

  • 15:30 taavi: manually prune user.log on tools-prometheus-03 to free up some space on /

2022-04-08

  • 10:44 arturo: disabled debug mode on the k8s jobs-emailer component

2022-04-05

2022-04-04

2022-03-28

  • 09:32 wm-bot: cleaned up grid queue errors on tools-sgegrid-master.tools.eqiad1.wikimedia.cloud (T304816) - cookbook ran by arturo@nostromo

2022-03-15

2022-03-14

  • 11:44 arturo: deploy jobs-framework-emailer 9470a5f (T286135)
  • 10:48 dcaro: pushed v0.33.2 tekton control and webhook images, and bashA5.1.4 to the local repo (T297090)

2022-03-10

  • 09:42 arturo: cleaned grid queue error state @ tools-sgewebgrid-generic-0902

2022-03-01

  • 13:41 dcaro: rebooting tools-sgeexec-0916 to clear any state (T302702)
  • 12:11 dcaro: Cleared error state queues for sgeexec-0916 (T302702)
  • 10:23 arturo: tools-sgeeex-0913/0916 are depooled, queue errors. Reboot them and clean errors by hand

2022-02-28

  • 08:02 taavi: reboot sgeexec-0916
  • 07:49 taavi: depool tools-sgeexec-0916.tools as it is out of disk space on /

2022-02-17

  • 08:23 taavi: deleted tools-clushmaster-02
  • 08:14 taavi: made tools-puppetmaster-02 its own client to fix `puppet node deactivate` puppetdb access

2022-02-16

  • 00:12 bd808: Image builds completed.

2022-02-15

  • 23:17 bd808: Image builds failed in buster php image with an apt error. The error looks transient, so starting builds over.
  • 23:06 bd808: Started full rebuild of Toolforge containers to pick up webservice 0.81 and other package updates in tmux session on tools-docker-imagebuilder-01
  • 22:58 bd808: `sudo apt-get update && sudo apt-get install toolforge-webservice` on all bastions to pick up 0.81
  • 22:50 bd808: Built new toollabs-webservice 0.81
  • 18:43 bd808: Enabled puppet on tools-proxy-05
  • 18:38 bd808: Disabled puppet on tools-proxy-05 for manual testing of nginx config changes
  • 18:21 taavi: delete tools-package-builder-03
  • 11:49 arturo: invalidate sssd cache in all bastions to debug T301736
  • 11:16 arturo: purge debian package `unscd` on tools-sgebastion-10/11 for T301736
  • 11:15 arturo: reboot tools-sgebastion-10 for T301736

2022-02-10

  • 15:07 taavi: shutdown tools-clushmaster-02 T298191
  • 13:25 wm-bot: trying to join node tools-sgewebgen-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 13:24 wm-bot: trying to join node tools-sgewebgen-10-1 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 13:07 wm-bot: trying to join node tools-sgeweblight-10-5 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 13:06 wm-bot: trying to join node tools-sgeweblight-10-4 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 13:05 wm-bot: trying to join node tools-sgeweblight-10-3 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 13:03 wm-bot: trying to join node tools-sgeweblight-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 12:54 wm-bot: trying to join node tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud to the grid cluster in tools. - cookbook ran by arturo@nostromo
  • 08:45 taavi: set `profile::base::manage_ssh_keys: true` globally T214427
  • 08:16 taavi: enable puppetdb and re-enable puppet with puppetdb ssh key management disabled (profile::base::manage_ssh_keys: false) - T214427
  • 08:06 taavi: disable puppet globally for enabling puppetdb T214427

2022-02-09

  • 19:29 taavi: installed tools-puppetdb-1, not configured on puppetmaster side yet T214427
  • 18:56 wm-bot: pooled 10 grid nodes tools-sgeweblight-10-[1-5],tools-sgewebgen-10-[1,2],tools-sgeexec-10-[1-10] (T277653) - cookbook ran by arturo@nostromo
  • 18:30 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo
  • 18:25 arturo: ignore last message
  • 18:24 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo
  • 14:04 taavi: created tools-cumin-1/toolsbeta-cumin-1 T298191

2022-02-07

  • 17:37 taavi: generated authdns_acmechief ssh key and stored password in a text file in local labs/private repository (T288406)
  • 12:52 taavi: updated maintain-kubeusers for T301081

2022-02-04

  • 22:33 taavi: `root@tools-sgebastion-10:/data/project/ru_monuments/.kube# mv config old_config` # experimenting with T301015
  • 21:36 taavi: clear error state from some webgrid nodes

2022-02-03

  • 09:06 taavi: run `sudo apt-get clean` on login-buster/dev-buster to clean up disk space
  • 08:01 taavi: restart acme-chief to force renewal of toolserver.org certificate

2022-01-30

  • 14:41 taavi: created a neutron port with ip 172.16.2.46 for a service ip for toolforge redis automatic failover T278541
  • 14:22 taavi: creating a cluster of 3 bullseye redis hosts for T278541

2022-01-26

  • 18:33 wm-bot: depooled grid node tools-sgeexec-10-10 - cookbook ran by arturo@nostromo
  • 18:33 wm-bot: depooled grid node tools-sgeexec-10-9 - cookbook ran by arturo@nostromo
  • 18:33 wm-bot: depooled grid node tools-sgeexec-10-8 - cookbook ran by arturo@nostromo
  • 18:32 wm-bot: depooled grid node tools-sgeexec-10-7 - cookbook ran by arturo@nostromo
  • 18:32 wm-bot: depooled grid node tools-sgeexec-10-6 - cookbook ran by arturo@nostromo
  • 18:31 wm-bot: depooled grid node tools-sgeexec-10-5 - cookbook ran by arturo@nostromo
  • 18:30 wm-bot: depooled grid node tools-sgeexec-10-4 - cookbook ran by arturo@nostromo
  • 18:28 wm-bot: depooled grid node tools-sgeexec-10-3 - cookbook ran by arturo@nostromo
  • 18:27 wm-bot: depooled grid node tools-sgeexec-10-2 - cookbook ran by arturo@nostromo
  • 18:27 wm-bot: depooled grid node tools-sgeexec-10-1 - cookbook ran by arturo@nostromo
  • 13:55 arturo: scaling up the buster web grid with 5 lighttd and 2 generic nodes (T277653)

2022-01-25

  • 11:50 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo
  • 11:44 arturo: rebooting buster exec nodes
  • 08:34 taavi: sign puppet certificate for tools-sgeexec-10-4

2022-01-24

  • 17:44 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo
  • 15:23 arturo: scaling up the grid with 10 buster exec nodes (T277653)

2022-01-20

  • 17:05 arturo: drop 9 of the 10 buster exec nodes created earlier. They didn't get DNS records
  • 12:56 arturo: scaling up the grid with 10 buster exec nodes (T277653)

2022-01-19

  • 17:34 andrewbogott: rebooting tools-sgeexec-0913.tools.eqiad1.wikimedia.cloud to recover from (presumed) fallout from the scratch/nfs move

2022-01-14

  • 19:09 taavi: set /var/run/lighttpd as world-writable on all lighttpd webgrid nodes, T299243

2022-01-12

  • 11:27 arturo: created puppet prefix `tools-sgeweblight`, drop `tools-sgeweblig`
  • 11:03 arturo: created puppet prefix 'tools-sgeweblig'
  • 11:02 arturo: created puppet prefix 'toolsbeta-sgeweblig'

2022-01-04

  • 17:18 bd808: tools-acme-chief-01: sudo service acme-chief restart
  • 08:12 taavi: disable puppet & exim4 on T298501

Archives