You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Nova Resource:Admin/SAL: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)

7 December 2020

5 December 2020

  • curprev 00:3500:35, 5 December 2020imported>Stashbot 70,700 bytes +1,279 andrewbogott: moving cloudvirt1023 back into maintenance because T269467 continues to puzzle

3 December 2020

  • curprev 23:2123:21, 3 December 2020imported>Stashbot 69,421 bytes +765 andrewbogott: removing all osds on cloudcephosd1004 for rebuild, T268746

2 December 2020

  • curprev 20:0420:04, 2 December 2020imported>Stashbot 68,656 bytes +909 andrewbogott: removing all osds on cloudcephosd1010 for rebuild, T268746

1 December 2020

  • curprev 20:0620:06, 1 December 2020imported>Stashbot 67,747 bytes +322 andrewbogott: removing all osds on cloudcephosd1014 for rebuild, T268746

30 November 2020

  • curprev 18:1218:12, 30 November 2020imported>Stashbot 67,425 bytes +131 andrewbogott: removing all osds from cloudcephosd1015 in order to investigate T268746

29 November 2020

  • curprev 17:1817:18, 29 November 2020imported>Stashbot 67,294 bytes +106 andrewbogott: cleaning up some logfiles in tools-sgecron-01 — drive is full

26 November 2020

  • curprev 22:5822:58, 26 November 2020imported>Stashbot 67,188 bytes +257 andrewbogott: deleting /var/log/haproxy logs older than 7 days in cloudcontrol100x. We need log rotation here it seems.

25 November 2020

  • curprev 19:3519:35, 25 November 2020imported>Stashbot 66,931 bytes +1,218 bstorm: repairing ceph pg `instructing pg 6.91 on osd.117 to repair`

22 November 2020

20 November 2020

  • curprev 12:4412:44, 20 November 2020imported>Stashbot 65,552 bytes +237 arturo: [codfw1dev] install conntrackd in cloudnet2003-dev/cloudnet2002-dev to research l3 agent HA reliability

17 November 2020

  • curprev 19:2119:21, 17 November 2020imported>Stashbot 65,315 bytes +103 andrewbogott: draining cloudvirt1012 to experiment with libvirt/cpu things

15 November 2020

10 November 2020

  • curprev 16:3816:38, 10 November 2020imported>Stashbot 65,109 bytes +243 arturo: icinga downtime toolschecker for 2h becasue toolsdb maintenance (T266587)

9 November 2020

  • curprev 12:4212:42, 9 November 2020imported>Stashbot 64,866 bytes +944 arturo: restarted neutron l3 agent in cloudnet1003 bc it still had the old default route (T265288)

2 November 2020

29 October 2020

  • curprev 16:5716:57, 29 October 2020imported>Stashbot 63,795 bytes +167 bstorm: silenced deployment-prep project alerts for 60 days since the downtime expired

25 October 2020

  • curprev 16:2016:20, 25 October 2020imported>Stashbot 63,628 bytes +230 andrewbogott: adding cloudvirt1038 to the 'ceph' aggregate and removing from the 'spare' aggregate. We need this space while waiting on network upgrades for empty cloudvirts (T216195)

23 October 2020

  • curprev 11:3011:30, 23 October 2020imported>Stashbot 63,398 bytes +467 arturo: [codfw1dev] openstack --os-project-id cloudinfra-codfw1dev recordset create --type PTR --record nat.cloudgw.codfw1dev.wikimediacloud.org. --description "created by hand" 0-29.57.15.185.in-addr.arpa. 1.0-29.57.15.185.in-addr.arpa. (T261724)

22 October 2020

  • curprev 10:4610:46, 22 October 2020imported>Stashbot 62,931 bytes +285 arturo: [codfw1dev] rebooting cloudinfra-internal-puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud to try fixing some DNS weirdness

21 October 2020

  • curprev 14:3614:36, 21 October 2020imported>Stashbot 62,646 bytes +343 andrewbogott: running apt-get update && apt-get install -y facter on all cloud-vps instances

20 October 2020

  • curprev 15:4715:47, 20 October 2020imported>Stashbot 62,303 bytes +315 arturo: changing DNS recursor ACLs (https://gerrit.wikimedia.org/r/c/operations/puppet/+/635314) this can be reverted any time if it causes problems (T261724)

19 October 2020

16 October 2020

15 October 2020

  • curprev 15:1715:17, 15 October 2020imported>Stashbot 61,030 bytes +258 arturo: [codfw1dev] try cleaning up anything related to address scopes in the neutron database (T261724)

13 October 2020

9 October 2020

  • curprev 10:1510:15, 9 October 2020imported>Stashbot 60,399 bytes +1,133 arturo: [codfwd1ev] root@cloudcontrol2001-dev:~# openstack router set --disable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw (T261724)

8 October 2020

  • curprev 16:1716:17, 8 October 2020imported>Stashbot 59,266 bytes +546 arturo: [codfw1dev] `root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 185.15.57.8 --no-dhcp --subnet-range 185.15.57.8/31 cloud-gw-transport-codfw` (with a hack -- see task) (T263622)

6 October 2020

  • curprev 21:3021:30, 6 October 2020imported>Stashbot 58,720 bytes +326 andrewbogott: moved cloudvirt1013 out of the 'ceph' aggregate and into the 'maintenance' aggregate for T243414

5 October 2020

  • curprev 17:4017:40, 5 October 2020imported>Stashbot 58,394 bytes +129 bd808: `service uwsgi-labspuppetbackend restart` on cloud-puppetmaster-03 (T264649)

2 October 2020

  • curprev 11:0511:05, 2 October 2020imported>Stashbot 58,265 bytes +234 arturo: [codfw1dev] restarting rabbitmq-server in all 3 control nodes, the l3 agent was misbehaving

1 October 2020

  • curprev 16:0616:06, 1 October 2020imported>Stashbot 58,031 bytes +166 arturo: rebooting cloudvirt1024 to validate changes to /etc/network/interfaces file

30 September 2020

28 September 2020

24 September 2020

  • curprev 15:4715:47, 24 September 2020imported>Stashbot 55,651 bytes +294 arturo: stopping/restarting rabbitmq-server in all cloudcontrol servers

18 September 2020

  • curprev 10:1610:16, 18 September 2020imported>Stashbot 55,357 bytes +593 arturo: cloudvirt1039 libvirtd service issues were fixed with a reboot

15 September 2020

14 September 2020

  • curprev 14:2114:21, 14 September 2020imported>Stashbot 54,583 bytes +421 andrewbogott: draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate

9 September 2020

  • curprev 18:1318:13, 9 September 2020imported>Stashbot 54,162 bytes +433 andrewbogott: restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms
  • curprev 00:0500:05, 9 September 2020imported>Stashbot 53,729 bytes +517 bd808: Running wmcs-novastats-dnsleaks (T262359)

3 September 2020

2 September 2020

  • curprev 08:4608:46, 2 September 2020imported>Stashbot 53,106 bytes +131 arturo: [codfw1dev] reimaging spare server labtestvirt2003 as debian buster (T261724)

1 September 2020

  • curprev 18:1818:18, 1 September 2020imported>Stashbot 52,975 bytes +640 andrewbogott: adding drives on cloudcephosd100[3-5] to ceph osd pool

31 August 2020

  • curprev 23:2623:26, 31 August 2020imported>Stashbot 52,335 bytes +303 bd808: Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock

28 August 2020

  • curprev 20:1220:12, 28 August 2020imported>Stashbot 52,032 bytes +100 bd808: Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003

26 August 2020

  • curprev 17:1217:12, 26 August 2020imported>Stashbot 51,932 bytes +198 bstorm: Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 T261336

21 August 2020

  • curprev 21:3421:34, 21 August 2020imported>Stashbot 51,734 bytes +99 andrewbogott: restarting nova-compute on cloudvirt1033; it seems stuck

19 August 2020

  • curprev 14:2114:21, 19 August 2020imported>Stashbot 51,635 bytes +130 andrewbogott: rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak

6 August 2020

  • curprev 21:0221:02, 6 August 2020imported>Stashbot 51,505 bytes +241 andrewbogott: removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts

4 August 2020

  • curprev 18:5418:54, 4 August 2020imported>Stashbot 51,264 bytes +105 bstorm: restarting mariadb on cloudcontrol1004 to setup parallel replication

3 August 2020

  • curprev 17:0217:02, 3 August 2020imported>Stashbot 51,159 bytes +137 bstorm: increased db connection limit to 800 across galera cluster because we were clearly hovering at limit

31 July 2020

  • curprev 19:2819:28, 31 July 2020imported>Stashbot 51,022 bytes +126 bd808: wmcs-novastats-dnsleaks --delete (lots of leaked fullstack-monitoring records to clean up)

27 July 2020

  • curprev 22:1722:17, 27 July 2020imported>Stashbot 50,896 bytes +150 andrewbogott: ceph osd pool set compute pg_num 2048

24 July 2020

  • curprev 19:1519:15, 24 July 2020imported>Stashbot 50,746 bytes +148 andrewbogott: ceph mgr module enable pg_autoscaler

22 July 2020

16 July 2020

  • curprev 10:4810:48, 16 July 2020imported>Stashbot 50,307 bytes +158 arturo: merging change to neutron dmz_cidr https://gerrit.wikimedia.org/r/c/operations/puppet/+/613123 (T257534)

15 July 2020

  • curprev 23:1523:15, 15 July 2020imported>Stashbot 50,149 bytes +545 bd808: Removed Merlijn van Deen from toollabs-trusted Gerrit group (T255697)

14 July 2020

  • curprev 15:1915:19, 14 July 2020imported>Stashbot 49,604 bytes +504 arturo: briefly set root@cloudnet1003:~ # sysctl net.ipv4.conf.all.accept_local=1 (in neutron qrouter netns) (T257534)

13 July 2020

  • curprev 16:1716:17, 13 July 2020imported>Stashbot 49,100 bytes +127 arturo: icinga downtime cloudcontrol[1003-1005].wikimedia.org for 1h for galera database movements

12 July 2020

  • curprev 17:3917:39, 12 July 2020imported>Stashbot 48,973 bytes +98 andrewbogott: switched eqiad1 keystone from m5 to cloudcontrol galera

10 July 2020

  • curprev 20:2620:26, 10 July 2020imported>Stashbot 48,875 bytes +88 andrewbogott: disabling nova api to move database to galera

9 July 2020

  • curprev 11:2311:23, 9 July 2020imported>Stashbot 48,787 bytes +378 arturo: [codfw1dev] rebooting cloudnet2003-dev again for testing sysct/puppet behavior (T257552)

6 July 2020

  • curprev 15:1615:16, 6 July 2020imported>Stashbot 48,409 bytes +76 arturo: installing 'aptitude' in all cloudvirts

3 July 2020

  • curprev 12:5112:51, 3 July 2020imported>Stashbot 48,333 bytes +455 arturo: [codfw1dev] galera cluster should be up and running, openstack happy (T256283)

2 July 2020

  • curprev 15:4115:41, 2 July 2020imported>Stashbot 47,878 bytes +273 arturo: `sudo wmcs-openstack --os-compute-api-version 2.55 flavor create --private --vcpus 8 --disk 300 --ram 16384 --property aggregate_instance_extra_specs:ceph=true --description "for packaging envoy" bigdisk-ceph` (T256983)

29 June 2020

  • curprev 14:2414:24, 29 June 2020imported>Stashbot 47,605 bytes +162 arturo: starting rabbitmq-server in all 3 cloudcontrol servers

18 June 2020

  • curprev 20:3820:38, 18 June 2020imported>Stashbot 47,443 bytes +130 andrewbogott: rebooting cloudservices2003-dev due to a mysterious 'host down' alert on a secondary ip

16 June 2020

  • curprev 15:3815:38, 16 June 2020imported>Stashbot 47,313 bytes +159 arturo: created by hand neutron port 9c0a9a13-e409-49de-9ba3-bc8ec4801dbf `paws-haproxy-vip` (T295217)

12 June 2020

  • curprev 13:2313:23, 12 June 2020imported>Stashbot 47,154 bytes +202 arturo: DNS zone `paws.wmcloud.org` transferred to the PAWS project (T195217)

11 June 2020

  • curprev 19:1919:19, 11 June 2020imported>Stashbot 46,952 bytes +428 bstorm_: proceeding with failback to labstore1004 now that DRBD devices are consistent T224582

10 June 2020

  • curprev 16:0916:09, 10 June 2020imported>Stashbot 46,524 bytes +169 andrewbogott: deleting all old cloud-ns0.wikimedia.org and cloud-ns1.wikimedia.org ns records in designate database T254496

9 June 2020

  • curprev 15:2515:25, 9 June 2020imported>Stashbot 46,355 bytes +337 arturo: icinga downtime everything cloud* lab* for 2h more (T253780)

5 June 2020

  • curprev 15:0815:08, 5 June 2020imported>Stashbot 46,018 bytes +148 andrewbogott: trying to re-enable puppet without losing cumin contact, as per https://phabricator.wikimedia.org/T254589

4 June 2020

  • curprev 14:2414:24, 4 June 2020imported>Stashbot 45,870 bytes +180 andrewbogott: disabling puppet on all instances for /labs/private recovery

28 May 2020

  • curprev 23:0223:02, 28 May 2020imported>Stashbot 45,690 bytes +169 bd808: `/usr/local/sbin/maintain-dbusers --debug harvest-replicas` (T253930)
  • curprev 00:3300:33, 28 May 2020imported>Stashbot 45,521 bytes +610 andrewbogott: shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for T253780

25 May 2020

  • curprev 16:3616:36, 25 May 2020imported>Stashbot 44,911 bytes +119 arturo: [codfw1dev] created zone `0-29.57.15.185.in-addr.arpa.` (T247972)

21 May 2020

  • curprev 19:2319:23, 21 May 2020imported>Stashbot 44,792 bytes +336 andrewbogott: disabling puppet on cloudbackup2001 to prevent the backup job from starting during maintenance

19 May 2020

  • curprev 22:5922:59, 19 May 2020imported>Stashbot 44,456 bytes +181 bd808: `apt-get install mariadb-client` on cloudcontrol1003

18 May 2020

  • curprev 21:3721:37, 18 May 2020imported>Stashbot 44,275 bytes +82 andrewbogott: rebuilding cloudnet2003-dev with Buster

15 May 2020

  • curprev 22:1022:10, 15 May 2020imported>Stashbot 44,193 bytes +375 bd808: Added reedy as projectadmin in cloudinfra project (T249774)

14 May 2020

  • curprev 23:2823:28, 14 May 2020imported>Stashbot 43,818 bytes +724 bstorm_: downtimed cloudvirt1004/6 and cloudvirt-wdqs1003 until tomorrow around this time T252831

12 May 2020

  • curprev 20:3320:33, 12 May 2020imported>Stashbot 43,094 bytes +747 andrewbogott: moving cloudvirt1023 to the 'standard' pool and out of the 'spare' pool

9 May 2020

  • curprev 16:5316:53, 9 May 2020imported>Stashbot 42,347 bytes +128 andrewbogott: rebuilding cloudcontrol2001-dev and 2003-dev with buster for T252121

8 May 2020

  • curprev 19:0219:02, 8 May 2020imported>Stashbot 42,219 bytes +118 bstorm_: moving tools-k8s-haproxy-2 from cloudvirt1021 to cloudvirt1017 to improve spread

5 May 2020

  • curprev 13:5813:58, 5 May 2020imported>Stashbot 42,101 bytes +101 andrewbogott: rebuilding cloudcontrol2004-dev to test new puppet changes

4 May 2020

  • curprev 09:0409:04, 4 May 2020imported>Stashbot 42,000 bytes +194 arturo: [codfw1dev] manually modify iptables ruleset to only allow SSH from WMF bastions on cloudservices2003-dev and cloudcontrol2004-dev (T251604)

21 April 2020

  • curprev 22:1222:12, 21 April 2020imported>Stashbot 41,806 bytes +205 andrewbogott: moving cloudvirt1004 out of the 'standard' aggregate and into the 'maintenance' aggregate

15 April 2020

13 April 2020

  • curprev 15:0715:07, 13 April 2020imported>Stashbot 41,502 bytes +110 jeh: restart memcached on labwebs to increase cache size T145703

9 April 2020

8 April 2020

  • curprev 19:2019:20, 8 April 2020imported>Stashbot 41,236 bytes +239 andrewbogott: rotated password and api token for pdns servers on cloudservices1003 and cloudservices1004

7 April 2020

  • curprev 20:5720:57, 7 April 2020imported>Stashbot 40,997 bytes +128 andrewbogott: service sssd stop; rm -rf /var/lib/sss/db*; service sssd start on tools-sgebastion-08

6 April 2020

  • curprev 22:3922:39, 6 April 2020imported>Stashbot 40,869 bytes +634 andrewbogott: deleting bogus groups cn=b'project-bastion',ou=groups,dc=wikimedia,dc=org and cn=b'project-tools',ou=groups,dc=wikimedia,dc=org from ldap

2 April 2020

  • curprev 20:5920:59, 2 April 2020imported>Stashbot 40,235 bytes +112 jeh: codfw1dev clear VM error states and start bastions, puppet master and database

1 April 2020

  • curprev 16:2716:27, 1 April 2020imported>Stashbot 40,123 bytes +126 arturo: [codfw1dev] enable puppet across the fleet clean vxlan changes (T248881)

31 March 2020

  • curprev 12:3512:35, 31 March 2020imported>Stashbot 39,997 bytes +801 arturo: [codfw1dev] restarting VMs: designaterockytest14, bastion-codfw1dev-0[1,2] (T248881)

30 March 2020

  • curprev 23:4223:42, 30 March 2020imported>Stashbot 39,196 bytes +399 bstorm_: deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards T246689

27 March 2020

  • curprev 21:2821:28, 27 March 2020imported>Stashbot 38,797 bytes +181 bd808: Created huggle.wmcloud.org Designate zone and allocated it to the huggle project

26 March 2020

  • curprev 15:0115:01, 26 March 2020imported>Stashbot 38,616 bytes +325 arturo: icinga downtime cloudvirt* cloudcontrol* cloudnet* lab* cloudstore*
(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)