You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Revision history of "Nova Resource:Admin/SAL"

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

  • curprev 18:56, 3 December 2021imported>Stashbot 151,474 bytes +176 andrewbogott: maintain-views and maintain-meta-p on clouddb1013-1020
  • curprev 01:17, 2 December 2021imported>Stashbot 151,298 bytes +1,832 wm-bot: Drained 'cloudvirt1028.eqiad.wmnet'. (T296790) - cookbook ran by andrew@buster
  • curprev 17:48, 28 November 2021imported>Stashbot 149,466 bytes +209 andrewbogott: moved cloudvirt1018 out of the 'localstorage' aggregate and into 'maintenance' for T296592. It will need to be moved back after the raid is rebuilt.
  • curprev 07:19, 21 November 2021imported>Stashbot 149,257 bytes +120 dcaro_away: restarting designate-sink with some extra logs in it (T296144)
  • curprev 15:48, 17 November 2021imported>Stashbot 149,137 bytes +280 andrewbogott: upgrading mariadb packages on eqiad1 cloudcontrols
  • curprev 13:31, 12 November 2021imported>Stashbot 148,857 bytes +142 arturo: restarting glance-api services to make sure they work with new ceph auth creds (T293752)
  • curprev 21:50, 8 November 2021imported>Stashbot 148,715 bytes +620 andrewbogott: returned clouddb pools back to normal after maintain_views run: https://gerrit.wikimedia.org/r/c/operations/puppet/+/737505 T216481
  • curprev 11:18, 5 November 2021imported>Stashbot 148,095 bytes +742 wm-bot: Added 1 new OSDs ['cloudcephosd1024.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
  • curprev 16:39, 4 November 2021imported>Stashbot 147,353 bytes +2,597 wm-bot: Added 1 new OSDs ['cloudcephosd1023.eqiad.wmnet'] (T295012) - cookbook ran by arturo@endurance
  • curprev 17:22, 3 November 2021imported>Stashbot 144,756 bytes +279 arturo: [codfw1dev] installing keepalived 2.1.5 from buster-backports on cloudgw2001-dev/2002-dev (T294956)
  • curprev 10:54, 2 November 2021imported>Stashbot 144,477 bytes +179 arturo: rebooting cloudnet1004/1003 for T291813
  • curprev 00:47, 24 October 2021imported>Stashbot 144,298 bytes +166 andrewbogott: deploying a change so that openstack clients use tls endpoints: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732738
  • curprev 10:19, 21 October 2021imported>Stashbot 144,132 bytes +227 arturo: drop firewall exception on core routers for wiki replicas legacy setup (T293897)
  • curprev 21:06, 20 October 2021imported>Stashbot 143,905 bytes +99 andrewbogott: creating cloudinfra-nfs project T293936
  • curprev 19:21, 18 October 2021imported>Stashbot 143,806 bytes +252 andrewbogott: also ticked the 'admin' box on wikitech for majavah T292827
  • curprev 12:28, 14 October 2021imported>Stashbot 143,554 bytes +149 arturo: [codfw1dev] add DB grants for cloudbackup2002.codfw.wmnet IP address to the cinder DB (T292546)
  • curprev 10:46, 13 October 2021imported>Stashbot 143,405 bytes +105 arturo: updating python3-neutron across the fleet (T292936)
  • curprev 09:06, 12 October 2021imported>Stashbot 143,300 bytes +200 dcaro: upgrading eqiad cloudnet hosts neutron packages (T292936)
  • curprev 09:39, 5 October 2021imported>Stashbot 143,100 bytes +152 arturo: [codfw1dev] cleaning up manila stuff from openstack (db, endpoints, tenant, VMs, and such) T291257
  • curprev 14:50, 30 September 2021imported>Stashbot 142,948 bytes +391 andrewbogott: sudo cumin "cloud*" "ps -ef | grep nslcd && service nslcd restart" and sudo cumin "lab*" "ps -ef | grep nslcd && service nslcd restart" T292202
  • curprev 09:41, 29 September 2021imported>Stashbot 142,557 bytes +196 arturo: [codfw1dev] cleanup manila shares definitions for a clean start now that the manila-sharecontroller VM is apparently well configured (T291257)
  • curprev 16:23, 28 September 2021imported>Stashbot 142,361 bytes +531 bstorm: downtime for clouddb1020 to reduce re-pages in case this goes badly T291963
  • curprev 10:07, 27 September 2021imported>Stashbot 141,830 bytes +169 arturo: cloudcontrol1004 apparently healthy T291446
  • curprev 13:02, 24 September 2021imported>Stashbot 141,661 bytes +211 arturo: [codfw1dev] create VM manila-share-controller-01 on cloudinfra-codfw1dev
  • curprev 12:13, 21 September 2021imported>Stashbot 141,450 bytes +677 arturo: [codfw1dev] trying to create a manila service image (T291257)
  • curprev 23:08, 20 September 2021imported>Stashbot 140,773 bytes +408 bstorm: ran `echo check > /sys/block/md0/md/sync_action` on cloudcontrol1004 to check raid
  • curprev 11:35, 17 September 2021imported>Stashbot 140,365 bytes +114 arturo: [codfw1dev] install manila on cloudcontrol2001-dev (T291257)
  • curprev 15:56, 16 September 2021imported>Stashbot 140,251 bytes +134 bstorm: removing downtime for labstore1005 so we'll know if it has another issue T290318
  • curprev 22:03, 9 September 2021imported>Stashbot 140,117 bytes +315 bstorm: restarted the prometheus-mysqld-exporter@s1 service as it was not working T290630
  • curprev 15:34, 3 September 2021imported>Stashbot 139,802 bytes +365 bstorm: rebooting labstore1005 to disconnect the drives from labstore1004 T290318
  • curprev 16:16, 30 August 2021imported>Stashbot 139,437 bytes +825 wm-bot: Added 1 new OSDs ['cloudcephosd1018.eqiad.wmnet'] - cookbook ran by andrew@buster
  • curprev 18:57, 27 August 2021imported>Stashbot 138,612 bytes +126 andrewbogott: raising toolsbeta ram/core/instances quotas so majavah can experiment with bullseye
  • curprev 14:45, 25 August 2021imported>Stashbot 138,486 bytes +534 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
  • curprev 17:39, 19 August 2021imported>Stashbot 137,952 bytes +93 bstorm: restarting glance image backup to try and clear the page
  • curprev 16:21, 18 August 2021imported>Stashbot 137,859 bytes +899 wm-bot: Rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by andrew@buster
  • curprev 15:11, 17 August 2021imported>Stashbot 136,960 bytes +119 andrewbogott: rebooting cloudcephosd1008 to force raid rebuild -- T287838
  • curprev 13:51, 11 August 2021imported>Stashbot 136,841 bytes +480 wm-bot: Finished rebooting node cloudcephosd1018.eqiad.wmnet - cookbook ran by dcaro@vulcanus
  • curprev 15:15, 10 August 2021imported>Stashbot 136,361 bytes +214 andrewbogott: restarting all designate services in eqiad1
  • curprev 09:37, 5 August 2021imported>Stashbot 136,147 bytes +106 dcaro: Taking one osd daemon down ot codfw cluster (T288203)
  • curprev 19:20, 4 August 2021imported>Stashbot 136,041 bytes +126 bd808: Running deleteBatch.php on cloudweb2001-dev to remove legacy Heira: pages from labtestwiki
  • curprev 17:40, 3 August 2021imported>Stashbot 135,915 bytes +85 bstorm: rerunning the glance backup script after failure
  • curprev 00:10, 31 July 2021imported>Stashbot 135,830 bytes +233 andrewbogott: "systemctl reset-failed cloud-init.service" on all VMs for T287309
  • curprev 21:32, 27 July 2021imported>Stashbot 135,597 bytes +313 andrewbogott: putting cloudvirt1012 back into service T286748
  • curprev 15:22, 23 July 2021imported>Stashbot 135,284 bytes +88 bstorm: update wikireplicas-dns for s7 fix for web replicas
  • curprev 17:07, 20 July 2021imported>Stashbot 135,196 bytes +215 andrewbogott: reloading haproxy on dbproxy1018 for T286598
  • curprev 00:10, 20 July 2021imported>Stashbot 134,981 bytes +465 bstorm: restarting nova-api on cloudcontrol1003 to try and recover whatever it's doing with designate_floating_ip_ptr_records_updater
  • curprev 09:55, 16 July 2021imported>Stashbot 134,516 bytes +103 dcaro: checking HP raid issues on coludvirt1012 (T286766)
  • curprev 21:08, 14 July 2021imported>Stashbot 134,413 bytes +316 andrewbogott: restarting lots of openstack services while trying to resolve T286675
  • curprev 10:12, 2 July 2021imported>Stashbot 134,097 bytes +1,731 wm-bot: The cluster is not rebalance after adding the new OSDs ['cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] (T285858) - cookbook ran by dcaro@vulcanus
  • curprev 16:27, 1 July 2021imported>Stashbot 132,366 bytes +2,402 bstorm: failed over cloudstore1009 to cloudstore1008 T224747
  • curprev 21:48, 30 June 2021imported>Stashbot 129,964 bytes +115 bstorm: downtimed space alerts for scratch on cloudstore1008 until after the migration
  • curprev 15:28, 25 June 2021imported>Stashbot 129,849 bytes +238 andrewbogott: restarting openstack services on cloudcontrol1005
  • curprev 13:54, 21 June 2021imported>Stashbot 129,611 bytes +228 dcaro: puppet fix merged and deployed, servers are back to normal
  • curprev 22:21, 20 June 2021imported>Stashbot 129,383 bytes +144 andrewbogott: clearing admin-monitoring VMs; puppet has been failing lately due to a full drive on the puppetmaster
  • curprev 01:18, 15 June 2021imported>Stashbot 129,239 bytes +130 bstorm: running a modified version of the prometheus dir size cron in screen T284964
  • curprev 10:13, 14 June 2021imported>Stashbot 129,109 bytes +110 dcaro: setting ssd to debug mode on tools-sgeexec-0917 (T284130)
  • curprev 10:58, 10 June 2021imported>Stashbot 128,999 bytes +3,910 wm-bot: Finished rebooting the nodes ['cloudcephmon2002-dev', 'cloudcephmon2003-dev', 'cloudcephmon2004-dev'] (T281248) - cookbook ran by dcaro@vulcanus
  • curprev 17:33, 9 June 2021imported>Stashbot 125,089 bytes +1,815 arturo: removed icinga downtime for cloudmetrics1002 -- to see if hardware is healthy (T281881)
  • curprev 23:19, 8 June 2021imported>Stashbot 123,274 bytes +2,253 bd808: Downtimed cloudmetrics1002 in icinga until 2021-06-30 23:59:01 (T281881)
  • curprev 14:27, 7 June 2021imported>Stashbot 121,021 bytes +138 andrewbogott: moving cloudvirt1040 from 'maintenance' aggregate to 'ceph' aggregate T281399
  • curprev 13:12, 1 June 2021imported>Stashbot 120,883 bytes +293 dcaro: Changed the ceph osd_memory_target on eqiad pool to 6Gi (we were reaching the limit, swapping at some points)
  • curprev 14:58, 27 May 2021imported>Stashbot 120,590 bytes +77 wm-bot: Testing - cookbook ran by dcaro@vulcanus
  • curprev 19:10, 26 May 2021imported>Stashbot 120,513 bytes +688 andrewbogott: reimaging cloudvirt1018 to support local VM storage
  • curprev 16:14, 25 May 2021imported>Stashbot 119,825 bytes +412 bd808: Closed #wikimedia-cloud-admin on f***node
  • curprev 22:32, 24 May 2021imported>Stashbot 119,413 bytes +302 andrewbogott: changing the default ttl for eqiad1.wikimedia.cloud. from 3600 to 60; this should help us avoid madness when re-using hostnames.
  • curprev 02:14, 22 May 2021imported>Stashbot 119,111 bytes +159 bstorm: downtiming SMART alerts on dumps server labstore1007 for the weekend because it has been flapping T281045
  • curprev 21:25, 13 May 2021imported>Stashbot 118,952 bytes +245 bstorm: converted the maps and scratch volumes on cloudstore1008 (standby) to drbd T224747
  • curprev 14:23, 12 May 2021imported>Stashbot 118,707 bytes +189 arturo: [codfw1dev] cleanup old unused agents (bgp, ovs)
  • curprev 18:00, 11 May 2021imported>Stashbot 118,518 bytes +198 andrewbogott: adding 'trove' service project in advance of deploying trove in eqiad1
  • curprev 10:53, 9 May 2021imported>Stashbot 118,320 bytes +109 arturo: icinga-downtime cloudmetrics1002 for 3 months (T275605)
  • curprev 13:51, 7 May 2021imported>Stashbot 118,211 bytes +252 andrewbogott: add inherited 'admin' right to novaadmin user throughout eqiad1. I was trying to narrow down the rights here but lack of admin breaks some workflows, e.g. T281894 and T282235
  • curprev 15:31, 6 May 2021imported>Stashbot 117,959 bytes +249 arturo: about to migrating CloudVPS network to the cloudgw architecture T270704
  • curprev 16:07, 5 May 2021imported>Stashbot 117,710 bytes +4,552 dcaro: disallowing insecure global ids on the eqiad ceph cluster (T280641)
  • curprev 16:05, 4 May 2021imported>Stashbot 113,158 bytes +1,656 wm-bot: Safe reboot of 'cloudvirt1028.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus
  • curprev 23:53, 3 May 2021imported>Stashbot 111,502 bytes +1,153 bstorm: running `maintain-dbusers harvest-replicas` on labstore1004 T281287
  • curprev 11:16, 30 April 2021imported>Stashbot 110,349 bytes +267 dcaro: draining and rebooting coludvirt1017, last one today (T280641)
  • curprev 15:11, 29 April 2021imported>Stashbot 110,082 bytes +404 dcaro: hard rebooting cloudmetrics1002, got hung again (T275605)
  • curprev 21:11, 28 April 2021imported>Stashbot 109,678 bytes +2,619 andrewbogott: cleaning up more references to deleted hypervisors with delete from services where topic='compute' and version != 53;
  • curprev 14:10, 27 April 2021imported>Stashbot 107,059 bytes +1,057 dcaro: codfw.openstack upgraded ceph libraries to 15.2.11 (T280641)
  • curprev 20:56, 26 April 2021imported>Stashbot 106,002 bytes +265 andrewbogott: deleting spurious 'codfw1dev' and 'codw1dev-4' regions in the dallas deployment; regions without endpoints break a bunch of things
  • curprev 13:49, 23 April 2021imported>Stashbot 105,737 bytes +569 dcaro: testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641)
  • curprev 17:59, 21 April 2021imported>Stashbot 105,168 bytes +439 dcaro: all monitors upgraded on codfw1 with one cookbook `cookbook --verbose -c ~/.config/spicerack/cookbook.yaml wmcs.ceph.upgrade_mons --monitor-node-fqdn cloudcephmon2002-dev.codfw.wmnet` (T280641)
  • curprev 20:21, 20 April 2021imported>Stashbot 104,729 bytes +114 andrewbogott: reboot cloudservices1003
  • curprev 08:40, 19 April 2021imported>Stashbot 104,615 bytes +218 dcaro: enabling puppet on labstore1004 after mysql restart (T279657)
  • curprev 10:48, 14 April 2021imported>Stashbot 104,397 bytes +588 dcaro: Upgrade of codfw ceph to octopus 15.2.20 done, will run some performance tests now (T274566)
  • curprev 16:42, 13 April 2021imported>Stashbot 103,809 bytes +989 dcaro: Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ (T274573)
  • curprev 21:33, 7 April 2021imported>Stashbot 102,820 bytes +84 andrewbogott: upgrading codfw1dev designate to Victoria
  • curprev 17:36, 4 April 2021imported>Stashbot 102,736 bytes +79 andrewbogott: upgrading eqiad1 designate to Ussuri
  • curprev 14:12, 2 April 2021imported>Stashbot 102,657 bytes +90 andrewbogott: upgrading codfw1dev to OpenStack version Ussuri
  • curprev 12:15, 1 April 2021imported>Stashbot 102,567 bytes +431 dcaro: Restoring the 4.9 kernel on cloudcephosd2003-dev and upgrading (T274565)
  • curprev 08:47, 31 March 2021imported>Stashbot 102,136 bytes +109 dcaro: upgrading cinder on codfw cloudcontrol2* nodes (T278845)
  • curprev 09:53, 30 March 2021imported>Stashbot 102,027 bytes +119 arturo: rebooting cloudnet1003 to cleanup conntrack table, it wouldn't cleanup by hand ...
  • curprev 15:42, 28 March 2021imported>Stashbot 101,908 bytes +80 andrewbogott: updated debian-10.0-buster base image
  • curprev 09:54, 27 March 2021imported>Stashbot 101,828 bytes +102 arturo: cleanup conntrack table in qrouter nents in cloudnet1003 (backup)
  • curprev 19:03, 25 March 2021imported>Stashbot 101,726 bytes +576 andrewbogott: deleting all unused (per wmcs-imageusage) Jessie base images from Glance
  • curprev 09:19, 24 March 2021imported>Stashbot 101,150 bytes +158 dcaro: restarted wmcs-backup on cloudvirt1024 as it failed due to an image being removed while running (T276892)
  • curprev 11:33, 23 March 2021imported>Stashbot 100,992 bytes +94 arturo: root@cloudcontrol1005:~# wmcs-novastats-dnsleaks --delete
  • curprev 10:10, 22 March 2021imported>Stashbot 100,898 bytes +191 arturo: cleanup conntrack table in standby node: aborrero@cloudnet1003:~ $ sudo ip netns exec qrouter-d93771ba-2711-4f88-804a-8df6fd03978a conntrack -F
  • curprev 17:18, 19 March 2021imported>Stashbot 100,707 bytes +293 bstorm: running `ALTER TABLE account MODIFY COLUMN type ENUM('user','tool','paws');` against the labsdbaccounts database on m5 T276284
  • curprev 00:30, 19 March 2021imported>Stashbot 100,414 bytes +94 bstorm: downtimed labstore1004 to check some things in debug mode
  • curprev 17:28, 17 March 2021imported>Stashbot 100,320 bytes +432 bstorm: restarted the backup-glance-images job to clear errors in systemd T271782
  • curprev 16:51, 10 March 2021imported>Stashbot 99,888 bytes +1,162 arturo: rebooting cloudvirt1030 for T275753
  • curprev 16:27, 9 March 2021imported>Stashbot 98,726 bytes +871 arturo: rebooting cloudvirt1027 (T275753)
  • curprev 21:40, 5 March 2021imported>Stashbot 97,855 bytes +748 andrewbogott: replacing 'observer' role with 'reader' role in eqiad1 T276018
  • curprev 18:36, 4 March 2021imported>Stashbot 97,107 bytes +1,044 andrewbogott: rebooting cloudmetrics1002; the console is hanging
  • curprev 17:16, 3 March 2021imported>Stashbot 96,063 bytes +1,804 andrewbogott: restarting rabbitmq-server on cloudcontrol1003,1004,1005; trying to explain amqp errors in scheduler logs
  • curprev 17:16, 2 March 2021imported>Stashbot 94,259 bytes +717 andrewbogott: rebooting cloudvirt1039 to see if I can trigger T276208
  • curprev 20:12, 1 March 2021imported>Stashbot 93,542 bytes +347 andrewbogott: removing novaadmin from all projects save 'admin' for T274385
  • curprev 04:54, 28 February 2021imported>Stashbot 93,195 bytes +162 andrewbogott: restarted redis-server on tools-redis-1003 and tools-redis-1004 in an attempt to reduce replag, no real change detected
  • curprev 00:33, 27 February 2021imported>Stashbot 93,033 bytes +4,713 andrewbogott: sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && reboot'
  • curprev 14:56, 25 February 2021imported>Stashbot 88,320 bytes +121 arturo: deployed wmcs-netns-events daemon to all cloudnet servers (T275483)
  • curprev 11:07, 24 February 2021imported>Stashbot 88,199 bytes +112 arturo: force-reboot cloudmetrics1002, add icinga downtime for 2 hours. Investigating some server issue
  • curprev 00:17, 24 February 2021imported>Stashbot 88,087 bytes +717 bstorm: set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main stretch image in glance on eqiad1 T275430
  • curprev 17:15, 22 February 2021imported>Stashbot 87,370 bytes +368 bstorm: restarting nova-compute on cloudvirt1016 and cloudvirt1036 in case it helps T275411
  • curprev 14:50, 18 February 2021imported>Stashbot 87,002 bytes +426 arturo: rebooting cloudnet1004 for T271058
  • curprev 15:58, 17 February 2021imported>Stashbot 86,576 bytes +153 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/664845 to cloudnet servers (T268335)
  • curprev 16:25, 15 February 2021imported>Stashbot 86,423 bytes +395 arturo: [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers (T272963)
  • curprev 12:01, 11 February 2021imported>Stashbot 86,028 bytes +692 arturo: [codfw1dev] drop instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` (was buster, cannot use it yet)
  • curprev 15:23, 9 February 2021imported>Stashbot 85,336 bytes +224 arturo: icinga-downtime for 2h everything *labs *cloud for openstack upgrades
  • curprev 18:50, 8 February 2021imported>Stashbot 85,112 bytes +253 bstorm: enabled puppet on cloudvirt1023 for now T274144
  • curprev 10:59, 5 February 2021imported>Stashbot 84,859 bytes +334 arturo: icinga-downtime labstore1004 tools share space check for 1 week (T272247)
  • curprev 10:12, 4 February 2021imported>Stashbot 84,525 bytes +147 dcaro: Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) (T273851)
  • curprev 09:59, 3 February 2021imported>Stashbot 84,378 bytes +203 dcaro: Doing a full vm backup on cloudvirt1024 with the new script (T260692)
  • curprev 17:14, 2 February 2021imported>Stashbot 84,175 bytes +346 dcaro: Changed osd memory limit from 4G to 8G (T273649)
  • curprev 15:36, 29 January 2021imported>Stashbot 83,829 bytes +155 andrewbogott: disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api
  • curprev 19:44, 28 January 2021imported>Stashbot 83,674 bytes +158 andrewbogott: shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train
  • curprev 00:50, 27 January 2021imported>Stashbot 83,516 bytes +101 bstorm: icinga-downtime cloudnet1004 for a week T271058
  • curprev 16:44, 22 January 2021imported>Stashbot 83,415 bytes +191 andrewbogott: upgrading designate on cloudvirt1003/1004 to OpenStack 'train'
  • curprev 11:35, 21 January 2021imported>Stashbot 83,224 bytes +338 arturo: merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 (T209082)
  • curprev 10:49, 20 January 2021imported>Stashbot 82,886 bytes +1,118 arturo: merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 (T209082)
  • curprev 10:17, 19 January 2021imported>Stashbot 81,768 bytes +103 arturo: icinga-downtime cloudnet1004 for 1 week (T271058)
  • curprev 16:00, 18 January 2021imported>Stashbot 81,665 bytes +865 dcaro: Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303)
  • curprev 16:53, 17 January 2021imported>Stashbot 80,800 bytes +126 arturo: icinga downtime labstore1004 /srv/tools space check for 3 days (T272247)
  • curprev 13:41, 15 January 2021imported>Stashbot 80,674 bytes +405 arturo: icinga downtime labstore1004 maintain-dbuser alert until 2021-01-19 (T272125)
  • curprev 17:03, 13 January 2021imported>Stashbot 80,269 bytes +927 arturo: remove cloudvirt1013 cloudvirt1032 cloudvirt1037 to the 'toobusy' host aggregate to prevent further CPU oversubscribing
  • curprev 10:33, 12 January 2021imported>Stashbot 79,342 bytes +175 arturo: reboot cloudnet1004
  • curprev 10:22, 11 January 2021imported>Stashbot 79,167 bytes +573 arturo: doubling size of conntrack table in cloudnet servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/655407 (T271058)
  • curprev 16:02, 10 January 2021imported>Stashbot 78,594 bytes +198 andrewbogott: restarting rabbitmq-server on all eqiad1 cloudcontrols
  • curprev 11:25, 8 January 2021imported>Stashbot 78,396 bytes +1,559 arturo: rebooting both cloudnet2002-dev/cloudnet2003-dev to make sure interfaces are set up correctl (T271517)
  • curprev 15:19, 7 January 2021imported>Stashbot 76,837 bytes +447 dcaro: Finished speed tests on cloudcephosd2001-dev, reprovisioning the osd.0 sdc (T271417)
  • curprev 10:40, 5 January 2021imported>Stashbot 76,390 bytes +134 dcaro: removing dumps-[1..*] backups from cloudvirt1024 as they are not needed (T271094)
  • curprev 07:06, 3 January 2021imported>Stashbot 76,256 bytes +117 dcaro: Got a network hiccup on cloudnet1004, keeping track here T271058
  • curprev 12:32, 28 December 2020imported>Stashbot 76,139 bytes +567 arturo: stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 (T260692)
  • curprev 15:38, 23 December 2020imported>Stashbot 75,572 bytes +333 andrewbogott: restarting rabbitmq on cloudcontrol1004; suspected leaks
  • curprev 15:30, 22 December 2020imported>Stashbot 75,239 bytes +231 dcaro: cleaning up 6778 dangling snapshots for glance images in eqiad (T270478)
  • curprev 16:18, 19 December 2020imported>Stashbot 75,008 bytes +84 dcaro: gzipped a bunch of logs on cloudvirt1004 due to / being out of space
  • curprev 00:14, 19 December 2020imported>Stashbot 74,924 bytes +2,096 bstorm: truncated /var/log/debug.1 on cloudcontrol1003 which appears to be the exact same content as the user.log files anyway
  • curprev 22:17, 17 December 2020imported>Stashbot 72,828 bytes +570 andrewbogott: correction to above, set the pg and pgp to 1024 for eqiad1-glance-images
  • curprev 09:31, 16 December 2020imported>Stashbot 72,258 bytes +121 dcaro: removing invalid backups from cloudvirt1024 (196 in total) (T269419)
  • curprev 17:42, 14 December 2020imported>Stashbot 72,137 bytes +359 dcaro: The removal freed ~12GB (still 100% usage :S) (T269419)
  • curprev 09:11, 13 December 2020imported>Stashbot 71,778 bytes +108 _dcaro: running backup purge script on cloudvirt1024 (T269419)
  • curprev 23:36, 10 December 2020imported>Stashbot 71,670 bytes +334 bstorm: cleaned up the logs for haproxy on cloudcontrol1003 by deleting all the gzipped ones and truncating the .1 file
  • curprev 18:01, 8 December 2020imported>Stashbot 71,336 bytes +387 dcaro: Host cloudvirt1030 up and running (T216195)
  • curprev 18:33, 7 December 2020imported>Stashbot 70,949 bytes +249 andrewbogott: putting cloudvirt1023 back into service T269467
  • curprev 00:35, 5 December 2020imported>Stashbot 70,700 bytes +1,279 andrewbogott: moving cloudvirt1023 back into maintenance because T269467 continues to puzzle
  • curprev 23:21, 3 December 2020imported>Stashbot 69,421 bytes +765 andrewbogott: removing all osds on cloudcephosd1004 for rebuild, T268746
  • curprev 20:04, 2 December 2020imported>Stashbot 68,656 bytes +909 andrewbogott: removing all osds on cloudcephosd1010 for rebuild, T268746
  • curprev 20:06, 1 December 2020imported>Stashbot 67,747 bytes +322 andrewbogott: removing all osds on cloudcephosd1014 for rebuild, T268746
  • curprev 18:12, 30 November 2020imported>Stashbot 67,425 bytes +131 andrewbogott: removing all osds from cloudcephosd1015 in order to investigate T268746
  • curprev 17:18, 29 November 2020imported>Stashbot 67,294 bytes +106 andrewbogott: cleaning up some logfiles in tools-sgecron-01 — drive is full
  • curprev 22:58, 26 November 2020imported>Stashbot 67,188 bytes +257 andrewbogott: deleting /var/log/haproxy logs older than 7 days in cloudcontrol100x. We need log rotation here it seems.
  • curprev 19:35, 25 November 2020imported>Stashbot 66,931 bytes +1,218 bstorm: repairing ceph pg `instructing pg 6.91 on osd.117 to repair`
  • curprev 17:40, 22 November 2020imported>Stashbot 65,713 bytes +161 andrewbogott: apt-get upgrade on cloudservices1003/1004
  • curprev 12:44, 20 November 2020imported>Stashbot 65,552 bytes +237 arturo: [codfw1dev] install conntrackd in cloudnet2003-dev/cloudnet2002-dev to research l3 agent HA reliability
  • curprev 19:21, 17 November 2020imported>Stashbot 65,315 bytes +103 andrewbogott: draining cloudvirt1012 to experiment with libvirt/cpu things
  • curprev 11:21, 15 November 2020imported>Stashbot 65,212 bytes +103 arturo: icinga downtime cloudbackup2002 for 48h (T267865)
  • curprev 16:38, 10 November 2020imported>Stashbot 65,109 bytes +243 arturo: icinga downtime toolschecker for 2h becasue toolsdb maintenance (T266587)
  • curprev 12:42, 9 November 2020imported>Stashbot 64,866 bytes +944 arturo: restarted neutron l3 agent in cloudnet1003 bc it still had the old default route (T265288)
  • curprev 13:36, 2 November 2020imported>Stashbot 63,922 bytes +127 arturo: (typo: dcaro)
  • curprev 16:57, 29 October 2020imported>Stashbot 63,795 bytes +167 bstorm: silenced deployment-prep project alerts for 60 days since the downtime expired
  • curprev 16:20, 25 October 2020imported>Stashbot 63,628 bytes +230 andrewbogott: adding cloudvirt1038 to the 'ceph' aggregate and removing from the 'spare' aggregate. We need this space while waiting on network upgrades for empty cloudvirts (T216195)
  • curprev 11:30, 23 October 2020imported>Stashbot 63,398 bytes +467 arturo: [codfw1dev] openstack --os-project-id cloudinfra-codfw1dev recordset create --type PTR --record nat.cloudgw.codfw1dev.wikimediacloud.org. --description "created by hand" 0-29.57.15.185.in-addr.arpa. 1.0-29.57.15.185.in-addr.arpa. (T261724)
  • curprev 10:46, 22 October 2020imported>Stashbot 62,931 bytes +285 arturo: [codfw1dev] rebooting cloudinfra-internal-puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud to try fixing some DNS weirdness
  • curprev 14:36, 21 October 2020imported>Stashbot 62,646 bytes +343 andrewbogott: running apt-get update && apt-get install -y facter on all cloud-vps instances
  • curprev 15:47, 20 October 2020imported>Stashbot 62,303 bytes +315 arturo: changing DNS recursor ACLs (https://gerrit.wikimedia.org/r/c/operations/puppet/+/635314) this can be reverted any time if it causes problems (T261724)
  • curprev 01:41, 19 October 2020imported>Stashbot 61,988 bytes +280 andrewbogott: deleting all Precise base images
  • curprev 09:29, 16 October 2020imported>Stashbot 61,708 bytes +678 arturo: [codfw1dev] still some DNS weirdness, investigating
  • curprev 15:17, 15 October 2020imported>Stashbot 61,030 bytes +258 arturo: [codfw1dev] try cleaning up anything related to address scopes in the neutron database (T261724)
  • curprev 17:54, 13 October 2020imported>Stashbot 60,772 bytes +373 andrewbogott: rebuilding cloudvirt1021 for backy support
  • curprev 10:15, 9 October 2020imported>Stashbot 60,399 bytes +1,133 arturo: [codfwd1ev] root@cloudcontrol2001-dev:~# openstack router set --disable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw (T261724)
  • curprev 16:17, 8 October 2020imported>Stashbot 59,266 bytes +546 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.8 --no-dhcp --subnet-range 185.15.57.8/31 cloud-gw-transport-codfw` (with a hack -- see task) (T263622)
  • curprev 21:30, 6 October 2020imported>Stashbot 58,720 bytes +326 andrewbogott: moved cloudvirt1013 out of the 'ceph' aggregate and into the 'maintenance' aggregate for T243414
  • curprev 17:40, 5 October 2020imported>Stashbot 58,394 bytes +129 bd808: `service uwsgi-labspuppetbackend restart` on cloud-puppetmaster-03 (T264649)
  • curprev 11:05, 2 October 2020imported>Stashbot 58,265 bytes +234 arturo: [codfw1dev] restarting rabbitmq-server in all 3 control nodes, the l3 agent was misbehaving
  • curprev 16:06, 1 October 2020imported>Stashbot 58,031 bytes +166 arturo: rebooting cloudvirt1024 to validate changes to /etc/network/interfaces file
  • curprev 16:47, 30 September 2020imported>Stashbot 57,865 bytes +1,958 andrewbogott: rebooting cloudvir1032, 1033, 1034 for T262979
  • curprev 14:55, 28 September 2020imported>Stashbot 55,907 bytes +256 arturo: [jbond42] upgraded facter to v3 across the VM fleet
  • curprev 15:47, 24 September 2020imported>Stashbot 55,651 bytes +294 arturo: stopping/restarting rabbitmq-server in all cloudcontrol servers
  • curprev 10:16, 18 September 2020imported>Stashbot 55,357 bytes +593 arturo: cloudvirt1039 libvirtd service issues were fixed with a reboot
  • curprev 20:32, 15 September 2020imported>Stashbot 54,764 bytes +181 andrewbogott: rebooting cloudvirt1038 to see if it resolves T262979
  • curprev 14:21, 14 September 2020imported>Stashbot 54,583 bytes +421 andrewbogott: draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate
  • curprev 18:13, 9 September 2020imported>Stashbot 54,162 bytes +433 andrewbogott: restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms
  • curprev 00:05, 9 September 2020imported>Stashbot 53,729 bytes +517 bd808: Running wmcs-novastats-dnsleaks (T262359)
  • curprev 09:32, 3 September 2020imported>Stashbot 53,212 bytes +106 arturo: icinga downtime cloud* servers for 30 mins (T261866)
  • curprev 08:46, 2 September 2020imported>Stashbot 53,106 bytes +131 arturo: [codfw1dev] reimaging spare server labtestvirt2003 as debian buster (T261724)
  • curprev 18:18, 1 September 2020imported>Stashbot 52,975 bytes +640 andrewbogott: adding drives on cloudcephosd100[3-5] to ceph osd pool
  • curprev 23:26, 31 August 2020imported>Stashbot 52,335 bytes +303 bd808: Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock
  • curprev 20:12, 28 August 2020imported>Stashbot 52,032 bytes +100 bd808: Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003
  • curprev 17:12, 26 August 2020imported>Stashbot 51,932 bytes +198 bstorm: Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 T261336
  • curprev 21:34, 21 August 2020imported>Stashbot 51,734 bytes +99 andrewbogott: restarting nova-compute on cloudvirt1033; it seems stuck
  • curprev 14:21, 19 August 2020imported>Stashbot 51,635 bytes +130 andrewbogott: rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak
  • curprev 21:02, 6 August 2020imported>Stashbot 51,505 bytes +241 andrewbogott: removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts
  • curprev 18:54, 4 August 2020imported>Stashbot 51,264 bytes +105 bstorm: restarting mariadb on cloudcontrol1004 to setup parallel replication
  • curprev 17:02, 3 August 2020imported>Stashbot 51,159 bytes +137 bstorm: increased db connection limit to 800 across galera cluster because we were clearly hovering at limit
  • curprev 19:28, 31 July 2020imported>Stashbot 51,022 bytes +126 bd808: wmcs-novastats-dnsleaks --delete (lots of leaked fullstack-monitoring records to clean up)
  • curprev 22:17, 27 July 2020imported>Stashbot 50,896 bytes +150 andrewbogott: ceph osd pool set compute pg_num 2048
  • curprev 19:15, 24 July 2020imported>Stashbot 50,746 bytes +148 andrewbogott: ceph mgr module enable pg_autoscaler
  • curprev 08:55, 22 July 2020imported>Stashbot 50,598 bytes +291 jbond42: [codfw1dev] upgrading hiera to version5
  • curprev 10:48, 16 July 2020imported>Stashbot 50,307 bytes +158 arturo: merging change to neutron dmz_cidr https://gerrit.wikimedia.org/r/c/operations/puppet/+/613123 (T257534)
  • curprev 23:15, 15 July 2020imported>Stashbot 50,149 bytes +545 bd808: Removed Merlijn van Deen from toollabs-trusted Gerrit group (T255697)
  • curprev 15:19, 14 July 2020imported>Stashbot 49,604 bytes +504 arturo: briefly set root@cloudnet1003:~ # sysctl net.ipv4.conf.all.accept_local=1 (in neutron qrouter netns) (T257534)
  • curprev 16:17, 13 July 2020imported>Stashbot 49,100 bytes +127 arturo: icinga downtime cloudcontrol[1003-1005].wikimedia.org for 1h for galera database movements
  • curprev 17:39, 12 July 2020imported>Stashbot 48,973 bytes +98 andrewbogott: switched eqiad1 keystone from m5 to cloudcontrol galera
  • curprev 20:26, 10 July 2020imported>Stashbot 48,875 bytes +88 andrewbogott: disabling nova api to move database to galera
  • curprev 11:23, 9 July 2020imported>Stashbot 48,787 bytes +378 arturo: [codfw1dev] rebooting cloudnet2003-dev again for testing sysct/puppet behavior (T257552)
  • curprev 15:16, 6 July 2020imported>Stashbot 48,409 bytes +76 arturo: installing 'aptitude' in all cloudvirts
  • curprev 12:51, 3 July 2020imported>Stashbot 48,333 bytes +455 arturo: [codfw1dev] galera cluster should be up and running, openstack happy (T256283)
  • curprev 15:41, 2 July 2020imported>Stashbot 47,878 bytes +273 arturo: `sudo wmcs-openstack --os-compute-api-version 2.55 flavor create --private --vcpus 8 --disk 300 --ram 16384 --property aggregate_instance_extra_specs:ceph=true --description "for packaging envoy" bigdisk-ceph` (T256983)
  • curprev 14:24, 29 June 2020imported>Stashbot 47,605 bytes +162 arturo: starting rabbitmq-server in all 3 cloudcontrol servers
  • curprev 20:38, 18 June 2020imported>Stashbot 47,443 bytes +130 andrewbogott: rebooting cloudservices2003-dev due to a mysterious 'host down' alert on a secondary ip
  • curprev 15:38, 16 June 2020imported>Stashbot 47,313 bytes +159 arturo: created by hand neutron port 9c0a9a13-e409-49de-9ba3-bc8ec4801dbf `paws-haproxy-vip` (T295217)
  • curprev 13:23, 12 June 2020imported>Stashbot 47,154 bytes +202 arturo: DNS zone `paws.wmcloud.org` transferred to the PAWS project (T195217)
  • curprev 19:19, 11 June 2020imported>Stashbot 46,952 bytes +428 bstorm_: proceeding with failback to labstore1004 now that DRBD devices are consistent T224582
  • curprev 16:09, 10 June 2020imported>Stashbot 46,524 bytes +169 andrewbogott: deleting all old cloud-ns0.wikimedia.org and cloud-ns1.wikimedia.org ns records in designate database T254496
  • curprev 15:25, 9 June 2020imported>Stashbot 46,355 bytes +337 arturo: icinga downtime everything cloud* lab* for 2h more (T253780)
  • curprev 15:08, 5 June 2020imported>Stashbot 46,018 bytes +148 andrewbogott: trying to re-enable puppet without losing cumin contact, as per https://phabricator.wikimedia.org/T254589
  • curprev 14:24, 4 June 2020imported>Stashbot 45,870 bytes +180 andrewbogott: disabling puppet on all instances for /labs/private recovery
  • curprev 23:02, 28 May 2020imported>Stashbot 45,690 bytes +169 bd808: `/usr/local/sbin/maintain-dbusers --debug harvest-replicas` (T253930)
  • curprev 00:33, 28 May 2020imported>Stashbot 45,521 bytes +610 andrewbogott: shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for T253780
  • curprev 16:36, 25 May 2020imported>Stashbot 44,911 bytes +119 arturo: [codfw1dev] created zone `0-29.57.15.185.in-addr.arpa.` (T247972)
  • curprev 19:23, 21 May 2020imported>Stashbot 44,792 bytes +336 andrewbogott: disabling puppet on cloudbackup2001 to prevent the backup job from starting during maintenance
  • curprev 22:59, 19 May 2020imported>Stashbot 44,456 bytes +181 bd808: `apt-get install mariadb-client` on cloudcontrol1003
  • curprev 21:37, 18 May 2020imported>Stashbot 44,275 bytes +82 andrewbogott: rebuilding cloudnet2003-dev with Buster
  • curprev 22:10, 15 May 2020imported>Stashbot 44,193 bytes +375 bd808: Added reedy as projectadmin in cloudinfra project (T249774)
  • curprev 23:28, 14 May 2020imported>Stashbot 43,818 bytes +724 bstorm_: downtimed cloudvirt1004/6 and cloudvirt-wdqs1003 until tomorrow around this time T252831
  • curprev 20:33, 12 May 2020imported>Stashbot 43,094 bytes +747 andrewbogott: moving cloudvirt1023 to the 'standard' pool and out of the 'spare' pool
  • curprev 16:53, 9 May 2020imported>Stashbot 42,347 bytes +128 andrewbogott: rebuilding cloudcontrol2001-dev and 2003-dev with buster for T252121
  • curprev 19:02, 8 May 2020imported>Stashbot 42,219 bytes +118 bstorm_: moving tools-k8s-haproxy-2 from cloudvirt1021 to cloudvirt1017 to improve spread
  • curprev 13:58, 5 May 2020imported>Stashbot 42,101 bytes +101 andrewbogott: rebuilding cloudcontrol2004-dev to test new puppet changes
  • curprev 09:04, 4 May 2020imported>Stashbot 42,000 bytes +194 arturo: [codfw1dev] manually modify iptables ruleset to only allow SSH from WMF bastions on cloudservices2003-dev and cloudcontrol2004-dev (T251604)
  • curprev 22:12, 21 April 2020imported>Stashbot 41,806 bytes +205 andrewbogott: moving cloudvirt1004 out of the 'standard' aggregate and into the 'maintenance' aggregate
  • curprev 18:44, 15 April 2020imported>Stashbot 41,601 bytes +99 jeh: create indexes and views for grwikimedia T245912
  • curprev 15:07, 13 April 2020imported>Stashbot 41,502 bytes +110 jeh: restart memcached on labwebs to increase cache size T145703
  • curprev 19:57, 9 April 2020imported>Stashbot 41,392 bytes +156 andrewbogott: upgrading eqiad1 designate to rocky
  • curprev 19:20, 8 April 2020imported>Stashbot 41,236 bytes +239 andrewbogott: rotated password and api token for pdns servers on cloudservices1003 and cloudservices1004
  • curprev 20:57, 7 April 2020imported>Stashbot 40,997 bytes +128 andrewbogott: service sssd stop; rm -rf /var/lib/sss/db*; service sssd start on tools-sgebastion-08
  • curprev 22:39, 6 April 2020imported>Stashbot 40,869 bytes +634 andrewbogott: deleting bogus groups cn=b'project-bastion',ou=groups,dc=wikimedia,dc=org and cn=b'project-tools',ou=groups,dc=wikimedia,dc=org from ldap
  • curprev 20:59, 2 April 2020imported>Stashbot 40,235 bytes +112 jeh: codfw1dev clear VM error states and start bastions, puppet master and database
  • curprev 16:27, 1 April 2020imported>Stashbot 40,123 bytes +126 arturo: [codfw1dev] enable puppet across the fleet clean vxlan changes (T248881)
  • curprev 12:35, 31 March 2020imported>Stashbot 39,997 bytes +801 arturo: [codfw1dev] restarting VMs: designaterockytest14, bastion-codfw1dev-0[1,2] (T248881)
  • curprev 23:42, 30 March 2020imported>Stashbot 39,196 bytes +399 bstorm_: deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards T246689
  • curprev 21:28, 27 March 2020imported>Stashbot 38,797 bytes +181 bd808: Created huggle.wmcloud.org Designate zone and allocated it to the huggle project
  • curprev 15:01, 26 March 2020imported>Stashbot 38,616 bytes +325 arturo: icinga downtime cloudvirt* cloudcontrol* cloudnet* lab* cloudstore*
  • curprev 19:29, 25 March 2020imported>Stashbot 38,291 bytes +285 andrewbogott: dumping a bunch of VMs on cloudvirt1015 to see if it still crashes
  • curprev 19:41, 24 March 2020imported>Stashbot 38,006 bytes +225 jeh: switch cloudvirt1016 from maintenance to standard host aggregate T243327
  • curprev 21:41, 23 March 2020imported>Stashbot 37,781 bytes +305 jeh: restart neutron-l3-agent on cloudnet100[3,4] to pickup policy.yaml changes
  • curprev 14:23, 21 March 2020imported>Stashbot 37,476 bytes +84 andrewbogott: restarting apache2 on labweb1001 and 1002
  • curprev 19:17, 18 March 2020imported>Stashbot 37,392 bytes +334 andrewbogott: deleted a bunch of records from the pdns database on cloudservices1003/1004 which had a record name but the content (where an IP address should be) was NULL, e.g. m.wikidata.beta.wmflabs.org.
  • curprev 17:40, 14 March 2020imported>Stashbot 37,058 bytes +99 jeh: restart maintain-dbusers on labstore1004 T247654
  • curprev 12:39, 13 March 2020imported>Stashbot 36,959 bytes +302 arturo: [codfw1dev] reintroduce address scopes for another round of testing T244851
  • curprev 22:29, 12 March 2020imported>Stashbot 36,657 bytes +130 bstorm_: running puppet across all dumps mounts to make sure active links are shifted to labstore1006
  • curprev 18:38, 11 March 2020imported>Stashbot 36,527 bytes +579 jeh: set icingia downtime until 2020-03-23 on CODFW cloud[control,net,virt] hosts during openstack upgrades
  • curprev 17:02, 10 March 2020imported>Stashbot 35,948 bytes +272 arturo: [codfw1dev] deleting address scopes, bad interaction with our custom NAT setup T247135
  • curprev 18:09, 9 March 2020imported>Stashbot 35,676 bytes +343 arturo: enabling puppet in cloudvirt1006, all services have been restored
  • curprev 14:54, 6 March 2020imported>Stashbot 35,333 bytes +115 andrewbogott: draining all instances off of cloudvirt1006 for T246908
  • curprev 14:24, 5 March 2020imported>Stashbot 35,218 bytes +475 arturo: [codfw1dev] we just enabled BGP session between cloudnet2xxx-dev and cr1-codfw (T245606)
  • curprev 22:22, 4 March 2020imported>Stashbot 34,743 bytes +776 andrewbogott: upgrading designate on cloudservices1003/1004 to Queens
  • curprev 16:54, 2 March 2020imported>Stashbot 33,967 bytes +159 arturo: [codfw1dev] deleted python3-os-ken debian package in cloudnet2003-dev which was installed by hand and had depedency issues
  • curprev 16:32, 29 February 2020imported>Stashbot 33,808 bytes +160 bstorm_: downtimed the smart alert on cloudvirt1009 until Monday since apparently predictive failures flap T244986
  • curprev 22:03, 26 February 2020imported>Stashbot 33,648 bytes +86 jeh: powering down cloudvirt1014 for hardware maintenance
  • curprev 16:08, 25 February 2020imported>Stashbot 33,562 bytes +458 andrewbogott: changing neutron's rabbitmq password because oslo is having trouble parsing some of the characters in the password
  • curprev 12:16, 24 February 2020imported>Stashbot 33,104 bytes +1,060 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr2-codfw` (T245606)
  • curprev 12:48, 21 February 2020imported>Stashbot 32,044 bytes +771 arturo: [codfw1dev] running `root@cloudcontrol2001-dev:~# neutron bgp-speaker-network-add bgpspeaker wan-transport-codfw` (T245606)
  • curprev 19:22, 20 February 2020imported>Stashbot 31,273 bytes +478 andrewbogott: updating designate pool config for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/572213/
  • curprev 22:19, 18 February 2020imported>Stashbot 30,795 bytes +429 andrewbogott: transferred the tools.wmcloud.org. to the tools project
  • curprev 10:35, 14 February 2020imported>Stashbot 30,366 bytes +477 arturo: running `root@cloudcontrol2001-dev:~# designate server-create --name ns1.openstack.codfw1dev.wikimediacloud.org.` (T243766)
  • curprev 13:38, 12 February 2020imported>Stashbot 29,889 bytes +270 arturo: [codfw1dev] add reference to subnetpool to the instance subnet `MariaDB [neutron]> update subnets set subnetpool_id='d129650d-d4be-4fe1-b13e-6edb5565cb4a' where id = '7adfcebe-b3d0-4315-92fe-e8365cc80668';` (T244851)
  • curprev 13:46, 11 February 2020imported>Stashbot 29,619 bytes +570 arturo: [codfw1dev] creating some neutron objects to investigate T244851 (subnets, subnet pools, address scopes, ...)
  • curprev 18:11, 7 February 2020imported>Stashbot 29,049 bytes +106 jeh: shutdown cloudvirt1016 for hardware maintenance T241882
  • curprev 14:44, 6 February 2020imported>Stashbot 28,943 bytes +169 jeh: update apt packages on cloudvirt1015 T220853
  • curprev 17:24, 28 January 2020imported>Stashbot 28,774 bytes +1,040 arturo: [codfw1dev] root@cloudcontrol2001-dev:~# designate server-create --name ns0.openstack.codfw1dev.wikimediacloud.org. (T243766)
  • curprev 12:45, 27 January 2020imported>Stashbot 27,734 bytes +495 arturo: [codfw1dev] manually move the new domain to the `cloudinfra-codfw1dev` project clouddb2001-dev: `[designate]> update zones set tenant_id='cloudinfra-codfw1dev' where id = '4c75410017904858a5839de93c9e8b3d';` T243556
  • curprev 15:10, 24 January 2020imported>Stashbot 27,239 bytes +185 jeh: remove icinga downtime for cloudvirt1013 T241313
  • curprev 17:43, 21 January 2020imported>Stashbot 27,054 bytes +253 bstorm_: remounting /mnt/nfs/dumps-labstore1007.wikimedia.org/ on all dumps-mounting projects
  • curprev 16:59, 15 January 2020imported>Stashbot 26,801 bytes +144 bd808: Changed the config for cloud-announce mailing list so that lsit admins do not get bounce unsubscribe notices
  • curprev 14:03, 14 January 2020imported>Stashbot 26,657 bytes +395 arturo: icinga downtime all cloudvirts for another 2h for fixing some icinga checks
  • curprev 13:34, 13 January 2020imported>Stashbot 26,262 bytes +269 arturo: [¢odfw1dev] prevent neutron from allocating floating IPs from the wrong subnet by doing `neutron subnet-update --allocation-pool start=208.80.153.190,end=208.80.153.190 cloud-instances-transport1-b-codfw` (T242594)
  • curprev 13:27, 10 January 2020imported>Stashbot 25,993 bytes +167 arturo: cloudvirt1009: virsh undefine i-000069b6. This is tools-elastic-01 which is running on cloudvirt1008 (so, leaked on cloudvirt1009)
  • curprev 11:12, 9 January 2020imported>Stashbot 25,826 bytes +397 arturo: running `MariaDB [nova_eqiad1]> update quota_usages set in_use='0' where project_id='etytree';` (T242332)
  • curprev 10:53, 8 January 2020imported>Stashbot 25,429 bytes +111 arturo: icinga downtime all cloudvirts for 30 minutes to re-create all canary VMs"
  • curprev 11:12, 7 January 2020imported>Stashbot 25,318 bytes +228 arturo: icinga-downtime everything cloud* for 30 minutes to merge nova scheduler changes
  • curprev 13:45, 6 January 2020imported>Stashbot 25,090 bytes +110 andrewbogott: restarting nova-api and nova-conductor on cloudcontrol1003 and 1004
  • curprev 16:34, 4 January 2020imported>Stashbot 24,980 bytes +130 arturo: icinga downtime cloudvirt1024 for 2 months because hardware errors (T241884)
  • curprev 11:46, 31 December 2019imported>Stashbot 24,850 bytes +158 andrewbogott: I couldn't!
  • curprev 10:13, 25 December 2019imported>Stashbot 24,692 bytes +205 arturo: icinga downtime for 30 minutes the whole cloud* lab* fleet to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/560575 (will restart some openstack components)
  • curprev 15:13, 24 December 2019imported>Stashbot 24,487 bytes +188 arturo: icinga downtime all the lab* fleet for nova password change for 1h
  • curprev 11:13, 23 December 2019imported>Stashbot 24,299 bytes +174 arturo: enable puppet in cloudcontrol1003/1004
  • curprev 23:48, 22 December 2019imported>Stashbot 24,125 bytes +290 andrewbogott: restarting nova-conductor and nova-api on cloudcontrol1003 and 1004
  • curprev 12:43, 20 December 2019imported>Stashbot 23,835 bytes +83 arturo: icinga downtime cloudmetrics1001 for 128 hours
  • curprev 12:55, 18 December 2019imported>Stashbot 23,752 bytes +191 arturo: [codfw1dev] created a new subnet neutron object to hold the new CIDR for floating IPs (cloud-codfw1dev-floating - 185.15.57.0/29) T239347
  • curprev 07:21, 17 December 2019imported>Stashbot 23,561 bytes +85 andrewbogott: deploying horizon/train to labweb1001/1002
  • curprev 06:11, 12 December 2019imported>Stashbot 23,476 bytes +173 arturo: schedule 4h downtime for labstores
  • curprev 06:28, 2 December 2019imported>Stashbot 23,303 bytes +611 andrewbogott: running nova-manage db sync on eqiad1
  • curprev 19:27, 18 November 2019imported>Stashbot 22,692 bytes +897 andrewbogott: repooling labsdb1011
  • curprev 20:04, 15 November 2019imported>Stashbot 21,795 bytes +681 andrewbogott: repool labdb1011 (T237509)
  • curprev 13:10, 11 November 2019imported>Stashbot 21,114 bytes +217 arturo: cloudweb2001-dev: disable puppet and redirect stderr in the loadExitNodes.php cron script to prevent cronspam while we investigate the cause of the issue (T237971)
  • curprev 11:59, 5 November 2019imported>Stashbot 20,897 bytes +171 arturo: icinga downtime for 1h cloudcontrol1004, cloudnet1003, cloudvirt1017/1020/1022 for PDU operations in the rack T227542
  • curprev 21:55, 4 November 2019imported>Stashbot 20,726 bytes +145 andrewbogott: deleting a ton of wikitech hiera pages that were either no-ops or refer to nonexistent VMs or prefixes
  • curprev 11:01, 31 October 2019imported>Stashbot 20,581 bytes +151 arturo: icinga-downtimed cloudvirt1030 and cloudservices1003 for 1h due to PDU upgrade operations T227543
  • curprev 22:43, 30 October 2019imported>Stashbot 20,430 bytes +232 jeh: reboot cloud-bootstrapvz-stretch to resolve bad bootstrapvz build
  • curprev 10:45, 25 October 2019imported>Stashbot 20,198 bytes +176 arturo: icinga downtime toolschecker for 1 to upgrade clouddb1002 mariadb (toolsdb secondary) (T236384 , T236420)
  • curprev 12:30, 24 October 2019imported>Stashbot 20,022 bytes +517 arturo: starting cloudvirt1019, PDU operations ended (T227540)
  • curprev 09:23, 23 October 2019imported>Stashbot 19,505 bytes +293 arturo: cloudvirt1026 reboot ended OK
  • curprev 16:01, 18 October 2019imported>Stashbot 19,212 bytes +420 arturo: created the `eqiad1.wikimedia.cloud` DNS zone (T235846)
  • curprev 21:59, 16 October 2019imported>Stashbot 18,792 bytes +512 jeh: resync wiki replica tool and user accounts T235697
  • curprev 13:30, 15 October 2019imported>Stashbot 18,280 bytes +97 jeh: creating indexes and views for banwiki T234770
  • curprev 18:55, 10 October 2019imported>Stashbot 18,183 bytes +250 bd808: Created indexes and views for nqowiki (T230543)
  • curprev 10:44, 9 October 2019imported>Stashbot 17,933 bytes +531 arturo: cloudvirt1013 rebooted well
  • curprev 14:07, 7 October 2019imported>Stashbot 17,402 bytes +196 arturo: horizon is disabled for maintenance (T212302)
  • curprev 15:23, 2 October 2019imported>Stashbot 17,206 bytes +779 arturo: codfw1dev renaming net/subnet objects to a more modern naming scheme T233665
  • curprev 10:21, 30 September 2019imported>Stashbot 16,427 bytes +290 arturo: we installed ferm in every VM by mistake. Deleting it and forcing a puppet agent run to try to go back to a clean state.
  • curprev 10:39, 18 August 2019imported>Stashbot 16,137 bytes +149 arturo: rebooting cloudvirt1023 for new interface names configuration
  • curprev 17:17, 5 August 2019imported>Stashbot 15,988 bytes +136 bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests)
  • curprev 20:14, 29 July 2019imported>Stashbot 15,852 bytes +114 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T194859)
  • curprev 12:32, 25 July 2019imported>Stashbot 15,738 bytes +688 arturo: eqiad1/glance: debian-9.9-stretch image deprecates debian-9.8-stretch (T228983)
  • curprev 19:43, 23 July 2019imported>Stashbot 15,050 bytes +98 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
  • curprev 23:44, 22 July 2019imported>Stashbot 14,952 bytes +114 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T228529)
  • curprev 22:07, 11 July 2019imported>Stashbot 14,838 bytes +343 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1003
  • curprev 16:05, 25 June 2019imported>Stashbot 14,495 bytes +213 bstorm_: updated python3.4 to update4 wherever it was installed on Jessie VMs to prevent issues with broken update3.
  • curprev 15:56, 3 June 2019imported>Stashbot 14,282 bytes +7,003 arturo: T221769 rebooting cloudservices1003 after bootstrapping is apparently completed
  • curprev 20:00, 13 January 2019imported>Stashbot 7,279 bytes +441 andrewbogott: VPS proxies are now running in eqiad1 on proxy-01. Old VMs will wait a bit for deletion. T213540
  • curprev 22:21, 9 January 2019imported>Stashbot 6,838 bytes +85 bd808: neutron quota-update --tenant-id tools --port 256
  • curprev 18:59, 8 January 2019imported>Stashbot 6,753 bytes +257 bd808: Definately did NOT delete uid=novaadmin,ou=people,dc=wikimedia,dc=org
  • curprev 22:03, 6 January 2019imported>Stashbot 6,496 bytes +126 bd808: Set floatingip quota of 60 for tools project in eqiad1-r region (T212360)
  • curprev 17:10, 20 December 2018imported>Stashbot 6,370 bytes +100 arturo: T207663 renumbered transport network in eqiad1
  • curprev 17:59, 5 December 2018imported>Stashbot 6,270 bytes +130 arturo: T207663 changed labtestn transport network addressing from private to public
  • curprev 13:25, 3 December 2018imported>Stashbot 6,140 bytes +107 arturo: T202886 create again PTR records after dnsleak.py fix
  • curprev 14:08, 30 November 2018imported>Stashbot 6,033 bytes +126 arturo: running dns leaks cleanup `root@cloudcontrol1003:~# /root/novastats/dnsleaks.py --delete`
  • curprev 17:33, 28 November 2018imported>Stashbot 5,907 bytes +94 gtirloni: deleted contintcloud project (T209644)
  • curprev 13:32, 27 November 2018imported>Stashbot 5,813 bytes +113 gtirloni: enabled DRBD stats collection on labstore100[4-5] T208446
  • curprev 07:12, 22 November 2018imported>Stashbot 5,700 bytes +76 gtirloni: deployed new debian-9.6-stretch image
  • curprev 10:48, 21 November 2018imported>Stashbot 5,624 bytes +134 arturo: re-created compat-net as not shared in labtestn to test stuff related to T209954
  • curprev 12:43, 16 November 2018imported>Stashbot 5,490 bytes +244 gtirloni: armed keyholder on labpuppetmaster1001/1002 after reboots
  • curprev 17:19, 14 November 2018imported>Stashbot 5,246 bytes +869 gtirloni: added cloudvirt1016 to scheduler pool (T209426)
  • curprev 18:17, 9 November 2018imported>Stashbot 4,377 bytes +96 gtirloni: restarted neutron-linuxbridge-agent on cloudvirt1018/1023
  • curprev 11:00, 8 November 2018imported>Stashbot 4,281 bytes +123 gtirloni: Added novaproxy-02 to $CACHES
  • curprev 13:49, 7 November 2018imported>Stashbot 4,158 bytes +144 arturo: T208733 moving labvirt1017 from main deployment to eqiad1 and renaming it to cloudvirt1017
  • curprev 16:24, 22 October 2018imported>Stashbot 4,014 bytes +248 arturo: T206261 another update to dmz_cidr in eqiad1
  • curprev 12:02, 19 October 2018imported>Stashbot 3,766 bytes +392 arturo: revert change in dmz_cidr in eqiad1 for now (T206261)
  • curprev 10:40, 26 September 2018imported>Stashbot 3,374 bytes +753 arturo: T205524 all sorts of restarts in all neutron daemons
  • curprev 22:08, 17 September 2018imported>Stashbot 2,621 bytes +2,384 bd808: Granted gtirloni project roles of admin, projectadmin, and user
  • curprev 00:55, 9 February 2018imported>Stashbot 237 bytes +211 bd808: Added Arturo Borrero Gonzalez and Bstorm as project members
  • curprev 21:02, 8 October 2015imported>Andrew Bogott 26 bytes +26