You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Admin/SAL: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)

20 June 2021

  • curprev 22:2122:21, 20 June 2021imported>Stashbot 129,383 bytes +144 andrewbogott: clearing admin-monitoring VMs; puppet has been failing lately due to a full drive on the puppetmaster

15 June 2021

  • curprev 01:1801:18, 15 June 2021imported>Stashbot 129,239 bytes +130 bstorm: running a modified version of the prometheus dir size cron in screen T284964

14 June 2021

  • curprev 10:1310:13, 14 June 2021imported>Stashbot 129,109 bytes +110 dcaro: setting ssd to debug mode on tools-sgeexec-0917 (T284130)

10 June 2021

  • curprev 10:5810:58, 10 June 2021imported>Stashbot 128,999 bytes +3,910 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] (T281248) - cookbook ran by dcaro@vulcanus

9 June 2021

  • curprev 17:3317:33, 9 June 2021imported>Stashbot 125,089 bytes +1,815 arturo: removed icinga downtime for cloudmetrics1002 -- to see if hardware is healthy (T281881)

8 June 2021

  • curprev 23:1923:19, 8 June 2021imported>Stashbot 123,274 bytes +2,253 bd808: Downtimed cloudmetrics1002 in icinga until 2021-06-30 23:59:01 (T281881)

7 June 2021

  • curprev 14:2714:27, 7 June 2021imported>Stashbot 121,021 bytes +138 andrewbogott: moving cloudvirt1040 from 'maintenance' aggregate to 'ceph' aggregate T281399

1 June 2021

  • curprev 13:1213:12, 1 June 2021imported>Stashbot 120,883 bytes +293 dcaro: Changed the ceph osd_memory_target on eqiad pool to 6Gi (we were reaching the limit, swapping at some points)

27 May 2021

  • curprev 14:5814:58, 27 May 2021imported>Stashbot 120,590 bytes +77 wm-bot: Testing - cookbook ran by dcaro@vulcanus

26 May 2021

  • curprev 19:1019:10, 26 May 2021imported>Stashbot 120,513 bytes +688 andrewbogott: reimaging cloudvirt1018 to support local VM storage

25 May 2021

  • curprev 16:1416:14, 25 May 2021imported>Stashbot 119,825 bytes +412 bd808: Closed #wikimedia-cloud-admin on f***node

24 May 2021

  • curprev 22:3222:32, 24 May 2021imported>Stashbot 119,413 bytes +302 andrewbogott: changing the default ttl for eqiad1.wikimedia.cloud. from 3600 to 60; this should help us avoid madness when re-using hostnames.

22 May 2021

  • curprev 02:1402:14, 22 May 2021imported>Stashbot 119,111 bytes +159 bstorm: downtiming SMART alerts on dumps server labstore1007 for the weekend because it has been flapping T281045

13 May 2021

  • curprev 21:2521:25, 13 May 2021imported>Stashbot 118,952 bytes +245 bstorm: converted the maps and scratch volumes on cloudstore1008 (standby) to drbd T224747

12 May 2021

  • curprev 14:2314:23, 12 May 2021imported>Stashbot 118,707 bytes +189 arturo: [codfw1dev] cleanup old unused agents (bgp, ovs)

11 May 2021

  • curprev 18:0018:00, 11 May 2021imported>Stashbot 118,518 bytes +198 andrewbogott: adding 'trove' service project in advance of deploying trove in eqiad1

9 May 2021

  • curprev 10:5310:53, 9 May 2021imported>Stashbot 118,320 bytes +109 arturo: icinga-downtime cloudmetrics1002 for 3 months (T275605)

7 May 2021

  • curprev 13:5113:51, 7 May 2021imported>Stashbot 118,211 bytes +252 andrewbogott: add inherited 'admin' right to novaadmin user throughout eqiad1. I was trying to narrow down the rights here but lack of admin breaks some workflows, e.g. T281894 and T282235

6 May 2021

  • curprev 15:3115:31, 6 May 2021imported>Stashbot 117,959 bytes +249 arturo: about to migrating CloudVPS network to the cloudgw architecture T270704

5 May 2021

  • curprev 16:0716:07, 5 May 2021imported>Stashbot 117,710 bytes +4,552 dcaro: disallowing insecure global ids on the eqiad ceph cluster (T280641)

4 May 2021

  • curprev 16:0516:05, 4 May 2021imported>Stashbot 113,158 bytes +1,656 wm-bot: Safe reboot of 'cloudvirt1028.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

3 May 2021

  • curprev 23:5323:53, 3 May 2021imported>Stashbot 111,502 bytes +1,153 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004 T281287

30 April 2021

  • curprev 11:1611:16, 30 April 2021imported>Stashbot 110,349 bytes +267 dcaro: draining and rebooting coludvirt1017, last one today (T280641)

29 April 2021

  • curprev 15:1115:11, 29 April 2021imported>Stashbot 110,082 bytes +404 dcaro: hard rebooting cloudmetrics1002, got hung again (T275605)

28 April 2021

  • curprev 21:1121:11, 28 April 2021imported>Stashbot 109,678 bytes +2,619 andrewbogott: cleaning up more references to deleted hypervisors with delete from services where topic='compute' and version != 53;

27 April 2021

  • curprev 14:1014:10, 27 April 2021imported>Stashbot 107,059 bytes +1,057 dcaro: codfw.openstack upgraded ceph libraries to 15.2.11 (T280641)

26 April 2021

  • curprev 20:5620:56, 26 April 2021imported>Stashbot 106,002 bytes +265 andrewbogott: deleting spurious 'codfw1dev' and 'codw1dev-4' regions in the dallas deployment; regions without endpoints break a bunch of things

23 April 2021

  • curprev 13:4913:49, 23 April 2021imported>Stashbot 105,737 bytes +569 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641)

21 April 2021

  • curprev 17:5917:59, 21 April 2021imported>Stashbot 105,168 bytes +439 dcaro: all monitors upgraded on codfw1 with one cookbook `cookbook --verbose -c ~/.config/spicerack/cookbook.yaml wmcs.ceph.upgrade_mons --monitor-node-fqdn cloudcephmon2002-dev.codfw.wmnet` (T280641)

20 April 2021

19 April 2021

  • curprev 08:4008:40, 19 April 2021imported>Stashbot 104,615 bytes +218 dcaro: enabling puppet on labstore1004 after mysql restart (T279657)

14 April 2021

  • curprev 10:4810:48, 14 April 2021imported>Stashbot 104,397 bytes +588 dcaro: Upgrade of codfw ceph to octopus 15.2.20 done, will run some performance tests now (T274566)

13 April 2021

  • curprev 16:4216:42, 13 April 2021imported>Stashbot 103,809 bytes +989 dcaro: Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ (T274573)

7 April 2021

  • curprev 21:3321:33, 7 April 2021imported>Stashbot 102,820 bytes +84 andrewbogott: upgrading codfw1dev designate to Victoria

4 April 2021

  • curprev 17:3617:36, 4 April 2021imported>Stashbot 102,736 bytes +79 andrewbogott: upgrading eqiad1 designate to Ussuri

2 April 2021

  • curprev 14:1214:12, 2 April 2021imported>Stashbot 102,657 bytes +90 andrewbogott: upgrading codfw1dev to OpenStack version Ussuri

1 April 2021

  • curprev 12:1512:15, 1 April 2021imported>Stashbot 102,567 bytes +431 dcaro: Restoring the 4.9 kernel on cloudcephosd2003-dev and upgrading (T274565)

31 March 2021

  • curprev 08:4708:47, 31 March 2021imported>Stashbot 102,136 bytes +109 dcaro: upgrading cinder on codfw cloudcontrol2* nodes (T278845)

30 March 2021

  • curprev 09:5309:53, 30 March 2021imported>Stashbot 102,027 bytes +119 arturo: rebooting cloudnet1003 to cleanup conntrack table, it wouldn't cleanup by hand ...

28 March 2021

27 March 2021

  • curprev 09:5409:54, 27 March 2021imported>Stashbot 101,828 bytes +102 arturo: cleanup conntrack table in qrouter nents in cloudnet1003 (backup)

25 March 2021

  • curprev 19:0319:03, 25 March 2021imported>Stashbot 101,726 bytes +576 andrewbogott: deleting all unused (per wmcs-imageusage) Jessie base images from Glance

24 March 2021

  • curprev 09:1909:19, 24 March 2021imported>Stashbot 101,150 bytes +158 dcaro: restarted wmcs-backup on cloudvirt1024 as it failed due to an image being removed while running (T276892)

23 March 2021

  • curprev 11:3311:33, 23 March 2021imported>Stashbot 100,992 bytes +94 arturo: root@cloudcontrol1005:~# wmcs-novastats-dnsleaks --delete

22 March 2021

  • curprev 10:1010:10, 22 March 2021imported>Stashbot 100,898 bytes +191 arturo: cleanup conntrack table in standby node: aborrero@cloudnet1003:~ $ sudo ip netns exec qrouter-d93771ba-2711-4f88-804a-8df6fd03978a conntrack -F

19 March 2021

  • curprev 17:1817:18, 19 March 2021imported>Stashbot 100,707 bytes +293 bstorm: running `ALTER TABLE account MODIFY COLUMN type ENUM('user','tool','paws');` against the labsdbaccounts database on m5 T276284
  • curprev 00:3000:30, 19 March 2021imported>Stashbot 100,414 bytes +94 bstorm: downtimed labstore1004 to check some things in debug mode

17 March 2021

  • curprev 17:2817:28, 17 March 2021imported>Stashbot 100,320 bytes +432 bstorm: restarted the backup-glance-images job to clear errors in systemd T271782

10 March 2021

9 March 2021

5 March 2021

  • curprev 21:4021:40, 5 March 2021imported>Stashbot 97,855 bytes +748 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1 T276018

4 March 2021

  • curprev 18:3618:36, 4 March 2021imported>Stashbot 97,107 bytes +1,044 andrewbogott: rebooting cloudmetrics1002; the console is hanging

3 March 2021

  • curprev 17:1617:16, 3 March 2021imported>Stashbot 96,063 bytes +1,804 andrewbogott: restarting rabbitmq-server on cloudcontrol1003,1004,1005; trying to explain amqp errors in scheduler logs

2 March 2021

  • curprev 17:1617:16, 2 March 2021imported>Stashbot 94,259 bytes +717 andrewbogott: rebooting cloudvirt1039 to see if I can trigger T276208

1 March 2021

  • curprev 20:1220:12, 1 March 2021imported>Stashbot 93,542 bytes +347 andrewbogott: removing novaadmin from all projects save 'admin' for T274385

28 February 2021

  • curprev 04:5404:54, 28 February 2021imported>Stashbot 93,195 bytes +162 andrewbogott: restarted redis-server on tools-redis-1003 and tools-redis-1004 in an attempt to reduce replag, no real change detected

27 February 2021

  • curprev 00:3300:33, 27 February 2021imported>Stashbot 93,033 bytes +4,713 andrewbogott: sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && reboot'

25 February 2021

  • curprev 14:5614:56, 25 February 2021imported>Stashbot 88,320 bytes +121 arturo: deployed wmcs-netns-events daemon to all cloudnet servers (T275483)

24 February 2021

  • curprev 11:0711:07, 24 February 2021imported>Stashbot 88,199 bytes +112 arturo: force-reboot cloudmetrics1002, add icinga downtime for 2 hours. Investigating some server issue
  • curprev 00:1700:17, 24 February 2021imported>Stashbot 88,087 bytes +717 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main stretch image in glance on eqiad1 T275430

22 February 2021

  • curprev 17:1517:15, 22 February 2021imported>Stashbot 87,370 bytes +368 bstorm: restarting nova-compute on cloudvirt1016 and cloudvirt1036 in case it helps T275411

18 February 2021

17 February 2021

  • curprev 15:5815:58, 17 February 2021imported>Stashbot 86,576 bytes +153 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/664845 to cloudnet servers (T268335)

15 February 2021

  • curprev 16:2516:25, 15 February 2021imported>Stashbot 86,423 bytes +395 arturo: [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers (T272963)

11 February 2021

  • curprev 12:0112:01, 11 February 2021imported>Stashbot 86,028 bytes +692 arturo: [codfw1dev] drop instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` (was buster, cannot use it yet)

9 February 2021

  • curprev 15:2315:23, 9 February 2021imported>Stashbot 85,336 bytes +224 arturo: icinga-downtime for 2h everything *labs *cloud for openstack upgrades

8 February 2021

5 February 2021

  • curprev 10:5910:59, 5 February 2021imported>Stashbot 84,859 bytes +334 arturo: icinga-downtime labstore1004 tools share space check for 1 week (T272247)

4 February 2021

  • curprev 10:1210:12, 4 February 2021imported>Stashbot 84,525 bytes +147 dcaro: Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) (T273851)

3 February 2021

  • curprev 09:5909:59, 3 February 2021imported>Stashbot 84,378 bytes +203 dcaro: Doing a full vm backup on cloudvirt1024 with the new script (T260692)

2 February 2021

29 January 2021

  • curprev 15:3615:36, 29 January 2021imported>Stashbot 83,829 bytes +155 andrewbogott: disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api

28 January 2021

  • curprev 19:4419:44, 28 January 2021imported>Stashbot 83,674 bytes +158 andrewbogott: shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train

27 January 2021

22 January 2021

  • curprev 16:4416:44, 22 January 2021imported>Stashbot 83,415 bytes +191 andrewbogott: upgrading designate on cloudvirt1003/1004 to OpenStack 'train'

21 January 2021

  • curprev 11:3511:35, 21 January 2021imported>Stashbot 83,224 bytes +338 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 (T209082)

20 January 2021

  • curprev 10:4910:49, 20 January 2021imported>Stashbot 82,886 bytes +1,118 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 (T209082)

19 January 2021

18 January 2021

  • curprev 16:0016:00, 18 January 2021imported>Stashbot 81,665 bytes +865 dcaro: Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303)

17 January 2021

  • curprev 16:5316:53, 17 January 2021imported>Stashbot 80,800 bytes +126 arturo: icinga downtime labstore1004 /srv/tools space check for 3 days (T272247)

15 January 2021

  • curprev 13:4113:41, 15 January 2021imported>Stashbot 80,674 bytes +405 arturo: icinga downtime labstore1004 maintain-dbuser alert until 2021-01-19 (T272125)

13 January 2021

  • curprev 17:0317:03, 13 January 2021imported>Stashbot 80,269 bytes +927 arturo: remove cloudvirt1013 cloudvirt1032 cloudvirt1037 to the 'toobusy' host aggregate to prevent further CPU oversubscribing

12 January 2021

11 January 2021

  • curprev 10:2210:22, 11 January 2021imported>Stashbot 79,167 bytes +573 arturo: doubling size of conntrack table in cloudnet servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/655407 (T271058)

10 January 2021

  • curprev 16:0216:02, 10 January 2021imported>Stashbot 78,594 bytes +198 andrewbogott: restarting rabbitmq-server on all eqiad1 cloudcontrols

8 January 2021

  • curprev 11:2511:25, 8 January 2021imported>Stashbot 78,396 bytes +1,559 arturo: rebooting both cloudnet2002-dev/cloudnet2003-dev to make sure interfaces are set up correctl (T271517)

7 January 2021

  • curprev 15:1915:19, 7 January 2021imported>Stashbot 76,837 bytes +447 dcaro: Finished speed tests on cloudcephosd2001-dev, reprovisioning the osd.0 sdc (T271417)

5 January 2021

  • curprev 10:4010:40, 5 January 2021imported>Stashbot 76,390 bytes +134 dcaro: removing dumps-[1..*] backups from cloudvirt1024 as they are not needed (T271094)

3 January 2021

  • curprev 07:0607:06, 3 January 2021imported>Stashbot 76,256 bytes +117 dcaro: Got a network hiccup on cloudnet1004, keeping track here T271058

28 December 2020

  • curprev 12:3212:32, 28 December 2020imported>Stashbot 76,139 bytes +567 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 (T260692)

23 December 2020

  • curprev 15:3815:38, 23 December 2020imported>Stashbot 75,572 bytes +333 andrewbogott: restarting rabbitmq on cloudcontrol1004; suspected leaks

22 December 2020

  • curprev 15:3015:30, 22 December 2020imported>Stashbot 75,239 bytes +231 dcaro: cleaning up 6778 dangling snapshots for glance images in eqiad (T270478)

19 December 2020

  • curprev 16:1816:18, 19 December 2020imported>Stashbot 75,008 bytes +84 dcaro: gzipped a bunch of logs on cloudvirt1004 due to / being out of space
  • curprev 00:1400:14, 19 December 2020imported>Stashbot 74,924 bytes +2,096 bstorm: truncated /var/log/debug.1 on cloudcontrol1003 which appears to be the exact same content as the user.log files anyway

17 December 2020

  • curprev 22:1722:17, 17 December 2020imported>Stashbot 72,828 bytes +570 andrewbogott: correction to above, set the pg and pgp to 1024 for eqiad1-glance-images

16 December 2020

  • curprev 09:3109:31, 16 December 2020imported>Stashbot 72,258 bytes +121 dcaro: removing invalid backups from cloudvirt1024 (196 in total) (T269419)

14 December 2020

13 December 2020

10 December 2020

  • curprev 23:3623:36, 10 December 2020imported>Stashbot 71,670 bytes +334 bstorm: cleaned up the logs for haproxy on cloudcontrol1003 by deleting all the gzipped ones and truncating the .1 file

8 December 2020

(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)