You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Nova Resource:Admin/SAL: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T194859)) |
imported>Stashbot (bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests)) |
||
Line 1: | Line 1: | ||
=== 2019-08-05 === | |||
* 17:17 bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests) | |||
=== 2019-07-29 === | === 2019-07-29 === | ||
* 20:14 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 ([[phab:T194859|T194859]]) | * 20:14 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 ([[phab:T194859|T194859]]) |
Revision as of 17:17, 5 August 2019
2019-08-05
- 17:17 bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests)
2019-07-29
- 20:14 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T194859)
2019-07-25
- 12:32 arturo: eqiad1/glance: debian-9.9-stretch image deprecates debian-9.8-stretch (T228983)
- 09:59 arturo: (codfw1dev) drop missing glance images (T228972)
- 09:32 arturo: (codfw1dev) deleting a bunch of VMs that were running in now missing hypervisors
- 09:31 arturo: (codfw1dev) deleting a bunch of VMs in ERROR and SHUTDOWN state
- 09:27 arturo: last log entry refers to the codfw1dev deployment
- 09:27 arturo: cleanup `nova service-list` from old hypervisors (labtest*)
- 09:23 arturo: refreshed nova DB grants in clouddb2001-dev for the codfw1dev deployment
- 08:47 arturo: cleanup the cloud-announce pending emails (spam)
2019-07-23
- 19:43 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
2019-07-22
- 23:44 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T228529)
2019-07-11
- 22:07 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1003
- 22:01 bd808: `sudo apt-get install python2.7-dbg` on cloudcontrol1003 to debug hung python process
- 21:48 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1004
2019-06-25
- 16:05 bstorm_: updated python3.4 to update4 wherever it was installed on Jessie VMs to prevent issues with broken update3.
- 14:56 bstorm_: Updated python 3.4 on the labs-puppetmaster server
2019-06-03
- 15:55 arturo: T221769 rebooting cloudservices1003 after bootstrapping is apparently completed
2019-05-28
- 21:42 bstorm_: unmounting labstore1003-scratch on all cloud clients
- 18:14 bstorm_: T209527 switched mounts from labstore1003 to cloudstore1008 for scratch
2019-05-20
- 17:25 arturo: T223923 dropped compat-network config from /etc/network/interfaces in eqiad1/codfw1dev neutron nodes
- 17:22 arturo: T223923 dropped br-compat bridges and vlan interfaces (1102 and 2102) in eqiad1/codfw1dev neutron nodes
- 17:07 arturo: T223923 dropped compat-network configuration from the neutron database in eqiad1
- 16:55 arturo: T223923 dropped compat-network configuration from the neutron database in codfw1dev
2019-05-15
- 17:00 andrewbogott: touching /root/firstboot_done on all VMs that cumin can reach. This will prevent firstboot.sh from running a second time if/when any of these are rebooted. T223370
2019-04-26
- 15:51 arturo: andrew updated dns servers for the cloud-instances2-b-eqiad subnet in neutron: 208.80.154.143 and 208.80.154.24
2019-04-25
- 11:14 arturo: T221760 increased size of conntrack table
2019-04-24
- 12:54 arturo: T220051 puppet broken in every VM in Cloud VPS, fixing right now
2019-04-22
- 11:14 arturo: create by hand /var/cache/labsaliaser/labs-ip-aliases.json in cloudservices2002-dev (T218575)
2019-04-16
- 22:55 bd808: cloudcontrol2003-dev: added `exit 0` to /etc/cron.hourly/keystone to stop cron spam on partially configured cluster
- 12:08 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
- 11:27 arturo: T219626 add DB grants for neutron and glnace to clouddb2001-dev (codfw1dev)
- 10:37 arturo: T219626 replace 208.80.153.75 with 208.80.153.59 in the clouddb2001-dev database (codfw1dev deployment)
- 10:30 arturo: T219626 replace labtestcontrol2003 with cloudcontrol2001-dev in the clouddb2001-dev database (codfw1dev deployment)
2019-04-15
- 13:08 arturo: T219626 add DB grants for keystone/nova/nova_api to clouddb2001-dev (codfw1dev)
2019-04-13
- 18:25 bd808: Restarted nova-compute service on cloudvirt1015 (T220853)
2019-04-11
- 12:00 arturo: T151704 deploying oidentd to cloudnet1xxx servers
2019-04-02
- 19:52 andrewbogott: installed new base Stretch image. Updated packages, and runs apt-get dist-upgrade on first boot.
2019-03-29
- 14:34 andrewbogott: moving tools-static.wmflabs.org to point to tools-static-13 in eqiad1-r
- 00:00 bstorm_: T193264 Added osm.db.svc.eqiad.wmflabs to cloud DNS
2019-03-25
- 00:40 bd808: Restarted maintain-dbusers on labstore1004. Process hung up on failed LDAP connection.
2019-03-21
- 19:32 andrewbogott: restarting keystone on cloudcontrol1003
2019-03-15
- 16:00 gtirloni: increased nscd cache size (T217280)
2019-03-14
- 19:04 gtirloni: bstorm started nfsd on labstore1006 (T218341)
- 16:42 gtirloni: published new debian-9.8 image (T218314)
2019-03-04
- 19:37 bstorm_: umounted /mnt/nfs/dumps-labstore1006.wikimedia.org across all VPS projects for T217473
2019-02-26
- 12:46 gtirloni: shutdown toolsbeta-sgegrid-master (cronspam)
2019-02-25
- 10:32 gtirloni: restarted nfsd on labstore1004
2019-02-21
- 09:09 gtirloni: restarted uwsgi-labspuppetbackend.service on labpuppetmaster1001
- 07:42 gtirloni: created project cloudstore
- 07:36 gtirloni: deleted wmcs-nfs project
2019-02-20
- 21:58 andrewbogott: silencing shinken and disabling puppet on shinken-02 for now
2019-02-19
- 12:00 gtirloni: added nagios@icinga2001.wikimedia.org to cloud-admin-feed@ allowed senders
2019-02-18
- 20:21 gtirloni: downtimed cloudvirt1020
- 20:12 gtirloni: ran `labs-ip-alias-dump.py` on cloudservices/labservices servers
2019-02-15
- 13:10 arturo: T216239 labvirt1019 has been drained
- 12:22 arturo: T216239 draining labvirt1009 with a command like this: `root@cloudcontrol1004:~# wmcs-cold-migrate --region eqiad --nova-db nova 2c0cf363-c7c3-42ad-94bd-e586f2492321 labvirt1001`
- 12:02 arturo: more nova service cleanups in the database (labvirts that were reallocated to eqiad1)
- 11:34 arturo: T216190 cleanup from nova database `nova service-delete 35`
- 03:50 andrewbogott: updated VPS base images for Jessie and Stretch, now featuring Stretch 9.7
2019-02-11
- 18:13 gtirloni: cleaned old metrics data in labmon1001 T215417
- 15:28 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1011
- 14:18 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1010
2019-02-08
- 14:56 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1009
2019-02-06
- 11:47 gtirloni: downtimed labmon100{1,2} T215399
- 00:17 bstorm_: T214106 deleted bstorm-test2 project to clean up
2019-02-05
- 10:48 arturo: labmon1001 is now part of the 'eqiad1-r' region
2019-02-01
- 09:54 arturo: moving canary1015-01 VM instance from cloudvirt1024 back to cloudvirt1015
2019-01-31
- 12:44 arturo: T215012 depooling cloudvirt1015 and migrating all VMs to cloudvirt1024
2019-01-25
2019-01-24
- 11:50 arturo: T213925 modify subnet cloud-instances-transport1-b-eqiad1 to avoid floating IP allocations from here
- 11:07 arturo: T214299 failover cloudnet1003 to cloudnet1004
- 10:03 arturo: T214299 reimage cloudnet1004 to debian stretch
- 09:51 arturo: T214299 failover cloudnet1004 to cloudnet1003
2019-01-22
- 19:19 arturo: T214299 stretch cloudnet1003 is apparently all set
- 18:40 arturo: T214299 manually delete from neutron agents from cloudnet1003 (must be added again after reimage, with new uuids)
- 18:37 arturo: T214299 reimaging cloudnet1003 as debian stretch
- 17:35 jbond42: starting roll out of apt package updates to
- 14:41 gtirloni: T214369 deployed new jessie and stretch VM images
2019-01-21
- 18:29 gtirloni: installed libguestfs-tools on cloudvirt1021
2019-01-16
- 14:21 andrewbogott: stopping old VPS proxies in eqiad — T213540
2019-01-15
- 14:20 andrewbogott: changing tools.wmflabs.org to point to tools-proxy-03 in eqiad1
2019-01-13
- 20:00 andrewbogott: VPS proxies are now running in eqiad1 on proxy-01. Old VMs will wait a bit for deletion. T213540
- 19:12 andrewbogott: moving the VPS proxy API backend to proxy-01.project-proxy.eqiad.wmflabs, as per T213540
- 17:11 andrewbogott: moving all VPS dynamic proxies to proxy-eqiad1.wmflabs.org aka proxy-01.project-proxy.eqiad.wmflabs, as per T213540
2019-01-09
- 22:21 bd808: neutron quota-update --tenant-id tools --port 256
2019-01-08
- 18:59 bd808: Definately did NOT delete uid=novaadmin,ou=people,dc=wikimedia,dc=org
- 18:59 bd808: Deleted LDAP user uid=neutron,ou=people,dc=wikimedia,dc=org
- 18:58 bd808: Deleted LDAP user uid=novaadmin,ou=people,dc=wikimedia,dc=org
2019-01-06
- 22:03 bd808: Set floatingip quota of 60 for tools project in eqiad1-r region (T212360)
2018-12-20
- 17:10 arturo: T207663 renumbered transport network in eqiad1
2018-12-05
- 17:59 arturo: T207663 changed labtestn transport network addressing from private to public
2018-12-03
- 13:25 arturo: T202886 create again PTR records after dnsleak.py fix
2018-11-30
- 14:08 arturo: running dns leaks cleanup `root@cloudcontrol1003:~# /root/novastats/dnsleaks.py --delete`
2018-11-28
- 17:33 gtirloni: deleted contintcloud project (T209644)
2018-11-27
- 13:32 gtirloni: enabled DRBD stats collection on labstore100[4-5] T208446
2018-11-22
- 07:12 gtirloni: deployed new debian-9.6-stretch image
2018-11-21
- 10:48 arturo: re-created compat-net as not shared in labtestn to test stuff related to T209954
2018-11-16
- 12:43 gtirloni: armed keyholder on labpuppetmaster1001/1002 after reboots
- 12:08 gtirloni: rebooted labpuppetmaster1001 (T207377)
- 11:57 gtirloni: rebooted labpuppetmaster1002 (T207377)
2018-11-14
- 17:19 gtirloni: added cloudvirt1016 to scheduler pool (T209426)
- 15:41 gtirloni: reimaging labvirt1016 as cloudvirt1016
- 15:14 gtirloni: reset-failed systemd unit nova-scheduler on cloudcontrol1004
- 13:52 gtirloni: rebooted labservices1002 after package upgrades (T207377)
- 13:23 gtirloni: rebooted labstore2004 after package upgrades (T207377)
- 13:20 gtirloni: rebooted labstore2003 after package upgrades (T207377)
- 13:20 gtirloni: rebooted labstore2001/labstore2003 after package upgrades (T207377)
- 12:08 gtirloni: rebooted labnet1002 after package upgrades
- 12:01 gtirloni: rebooted labmon1002 after package upgrades
- 11:41 gtirloni: rebooted labcontrol1002 after package upgrades
- 11:15 gtirloni: rebooted cloudcontrol1004 after package upgrades
2018-11-09
- 18:17 gtirloni: restarted neutron-linuxbridge-agent on cloudvirt1018/1023
2018-11-08
- 11:00 gtirloni: Added novaproxy-02 to $CACHES
- 10:50 gtirloni: Added cloudvirt1017 to eqiad1 region
2018-11-07
- 13:49 arturo: T208733 moving labvirt1017 from main deployment to eqiad1 and renaming it to cloudvirt1017
2018-10-22
- 16:24 arturo: T206261 another update to dmz_cidr in eqiad1
- 10:26 arturo: change again in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)
2018-10-19
- 12:02 arturo: revert change in dmz_cidr in eqiad1 for now (T206261)
- 11:16 arturo: change in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)
- 10:14 arturo: we have new virt servers in the eqiad1 deployment since past week and this week: cloudvirt1018, cloudvirt1023, cloudvirt1024
2018-09-26
- 10:40 arturo: T205524 all sorts of restarts in all neutron daemons
- 10:20 arturo: T205524 stop/start all neutron agents in cloudnet1003.eqiad.wmnet
- 10:13 arturo: T205524 restart all agents in cloudnet1004.eqiad.wmnet
- 10:10 arturo: restart neutron-server in cloudcontrol1003, investigating T205524
2018-09-24
- 10:57 arturo: try to increase floating ip allocation pool in eqiad1. Of 185.15.56.0/25 we are using only 185.15.56.10-185.15.56.31, I don't know why. Let's use 185.15.56.2-185.15.56.126
2018-09-21
- 17:18 bd808: Running `sudo maintain-meta_p --all-databases --purge` across labsdb10(09|10|11) for T201890
2018-09-17
- 22:08 bd808: Granted gtirloni project roles of admin, projectadmin, and user
2018-09-12
- 11:20 arturo: T202636 distributing default routes using classless-static-route for all VMs in main/labtest (dnsmasq/nova-network)
2018-09-11
- 16:52 arturo: again, restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
- 16:08 arturo: restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
- 10:53 arturo: T202636 creating all the compat-network configuration in neutron
- 10:36 arturo: T202636 creating br-compat bridge in eqiad1 for the compat network
- 10:33 arturo: T202636 manually reserve 10.68.23.253 (in nova-network)
2018-09-10
- 22:46 andrewbogott: deleting all VMs on labvirt1019 and 1020 as prep for T204003
2018-08-30
- 15:46 andrewbogott: restarting rabbitmq-server on cloudcontrol1003
- 13:07 arturo: T202636 internal network routing now exists in labtest/labtestn for VM to communicate with each other
2018-08-28
- 11:04 arturo: T202549 eqiad1 databases are all now running in m5-master. Mysql has been cleaned from cloudcontrol100[3,4]
2018-08-23
- 16:17 arturo: T188589 bstorm_ merged patch to reduce nova DB connection usage
- 13:15 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.4,end=10.64.22.4 e4fb2771-a361-4add-ac4e-280cc300c59f`
- 13:10 arturo: T202115 (was `{"start": "10.64.22.2", "end": "10.64.22.254"}` )
- 13:08 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.254,end=10.64.22.254 e4fb2771-a361-4add-ac4e-280cc300c59f`
2018-08-22
- 15:28 arturo: cleanup local glance,keystone databases in cloudcontrol1003.wikimedia.org (already in m5-master)
- 15:27 arturo: cleanup local keystone database in cloudcontrol1003.wikimedia.org (already in m5-master)
2018-08-21
- 15:39 andrewbogott: initial test message
- 10:31 arturo: eqiad1 remove leftover port for HA on labnet1004
- 10:15 arturo: test
2018-05-07
- 18:07 bstorm_: stopped the toolhistory job because it is totally broken and fills /tmp.
2018-02-09
- 00:55 bd808: Added Arturo Borrero Gonzalez and Bstorm as project members
- 00:54 bd808: Removed Yuvipanda at user request (T186289)