You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Wikimedia Cloud Services team/EnhancementProposals/Network refresh"

From Wikitech
Jump to navigation Jump to search
imported>DSquirrelGM
imported>Arturo Borrero Gonzalez
(→‎Introduce IPv6 as a replacement of dmz_cidr: add more details about Kubernetes)
Line 73: Line 73:
 
If we have a way to identify VM instances in the physical internal prod network by means of the IPv6 address, we no longer need our internal neutron customizations (dmz_cidr).
 
If we have a way to identify VM instances in the physical internal prod network by means of the IPv6 address, we no longer need our internal neutron customizations (dmz_cidr).
 
Basically, we could leverage IPv6 to address the 2 constraint statements defined above.
 
Basically, we could leverage IPv6 to address the 2 constraint statements defined above.
 +
 +
=== Toolforge kubernetes ===
  
 
One of the main challengues of this proposal is to get kubernetes to work with IPv6, specifically the [[Portal:Toolforge/Admin/Networking_and_ingress | Toolforge Kubernetes cluster]].
 
One of the main challengues of this proposal is to get kubernetes to work with IPv6, specifically the [[Portal:Toolforge/Admin/Networking_and_ingress | Toolforge Kubernetes cluster]].
 +
 +
According to the [https://kubernetes.io/docs/concepts/services-networking/dual-stack/ kubernetes upstream documentation] we need at least '''v1.16''' to run kubernetes in dual stack IPv4/IPv6 mode (by the time of this writting we are using v1.15.6).
 +
 +
Some additional arguments are required for a bunch of kubernetes components, like the apiserver and the controller manager. Some of them can be specified when bootstrapping the cluster with kubeadm.
 +
 +
Another change required is to get kube-proxy running in '''ipvs''' mode (by the time of this writting we are using the iptables mode).
 +
 +
The documentation also notes that Service objects can be either IPv4 or IPv6 but not both at the same time. This means the '''webservice''' mechanism would need to create 2 services per tool, setting the <code>.spec.ipFamily</code> field accordingly in the definition (but IPv6 can be set as default).
 +
 +
Per the upstream docs, no special configuration is required to get nginx-ingress working in IPv6.
 +
No special changes (other than enable IPv6) should be required in either the tools front proxy or the kubernetes haproxy.
 +
 +
The [https://docs.projectcalico.org/v3.5/usage/ipv6#enabling-ipv6-with-kubernetes calico docs for IPv6] contain detailed information on how to enable IPv6 for calico, which seems straigth forward.
 +
No special changes seems to be required to get coredns working on IPv6.
 +
 +
Summary:
 +
* kubernetes '''v1.16''' is required.
 +
* kube-proxy '''ipv6''' mode is required.
 +
* activate '''kubeadm''', apiserver, controller manager, etc IPv6 support.
 +
* activate '''calico''' IPv6 support.
 +
* enable IPv6 in webservice-created '''Service objects'''.
 +
 +
=== Timeline ===
  
 
Proposed timeline:
 
Proposed timeline:

Revision as of 12:52, 14 February 2020

This page contains an enhacement proposal for the CloudVPS service, specifically the network. Several goals are involved:

  • getting rid of some technical debt (neutron source code customizations for dmz_cidr)
  • enabling IPv6
  • enabling additional features to improve robustness and scalability of the service (multi-row networking, BGP, etc)
  • enable additional use cases inside CloudVPS (per-project self-services networks)

Working with openstack neutron is complex for several reasons. Mainly our current usage of openstack/neutron is apparently not very well defined upstream. And anyway openstack documentation is sometimes not clear which bits are for what model/use cases.

Constraints

Some context on the constraints we have.

VM instance identification vs NAT

One of the main constraints of our current model is that we want to preserve source address of VM instances running in CloudVPS when they contact WMF prod services (such as wikis and APIs) and also Cloud supporting services (such as NFS, LDAP, etc).

The general case for VMs running in CloudVPS is that all network traffic leaving the deployment heading to the internet will be translated to a single IPv4 address (called routing_source_ip). Traditionally, we've had many benefits of knowing exactly which VM instance is communicating with physical prod services, so there is a way to exclude VM traffic from this NAT. This is currently implemented by means of the dmz_cidr mechanism. This mechanism instruct neutron to don't do NAT for connections between VM instances and physical internal prod networks. Services running in such physical internal prod networks see the original source IP address of the VM instance (typically in the 172.16.0.0/16 range).

This mechanism, however, is not native to Openstack Neutron, and is something we added to the source code of neutron by means of patches. With every new openstack release we upgrade, the patches should be backported, which is a manual, tedious, hard to test and error-prone process. We decided this is considered technical debt and we want to get rid of this.

But we want a way to identify VM instances in the physical internal prod network.

Statements:

  • we would like to get rid of the NAT exclusion by means of our internal neutron customizations (dmz_cidr).
  • we would like to be able to uniquely identify VM instances in the physical internal prod network.

Datacenter row confinement

Traditionally, all of our CloudVPS hardware servers have been deployed in a single datacenter row (row B). This served mainly 2 purposes:

  • isolation: all cloud things are in a single row, a single physical prod network, with limited 'blast radious'.
  • topology: the CloudVPS neutron setup we use benefits from this single row, single physical prod network model. It made it relatively simple for us to implement and manage all the networking bits for CloudVPS.

There is a physical VLAN defined in row B (cloud-instances2-b-eqiad or vlan 1105) which is what our Neutron setup uses to provide network connectivity to VM instances. All VM instances have addressing from this subnet and have direct access to this VLAN.

However, we identified this confinement into a single row, single physical prod network has some consequences. First and foremost, racking space and physical capacity in a single row. Everyday we have less racking space left in row B and less capacity left in physical network switches (and maybe other physical facilities).

Also, in the past we had problems with availability/reliability, such as loosing a key component of the physical setup (a router) meaning severe downtime for the whole CloudVPS service.

Statements:

  • we would like a network model that allows us to cross datacenter row boundaries, meaning we could rack our CloudVPS servers in at least 2 different datacenter rows

Clear separation and no special trust

Traffic that reach services in the physical internal prod network from CloudVPS VM instances should be seen as coming from the internet, in the sense that no special trust is given to it. Network flows should reach perimetral core routers, and firewalling should be applied to them, among other measures.

Statement:

  • CloudVPS traffic is untrusted in the physical internal prod network.

Proposals

Introduce IPv6 as a replacement of dmz_cidr

The proposal is to introduce dual stack IPv4/IPv6 networking inside CloudVPS.

This would require several things to be done as well:

  • design and introduce an IPv6 addressing plan for CloudVPS.
  • introduce backbone/transport IPv6 support in the link between the prod core routers and our neutron virtual routers.
  • update the DNS setup (generate AAAA and PTR records for the new IPv6 addresses).
  • review for any other changes inside openstack and/or neutron required to support IPv6
  • review and update our cloud supporting services (NFS, LDAP, etc) to promote IPv6 as the preferred networking mechanism to communicate with VM instances.

Introducing IPv6 could have several benefits at the same time:

  • native VM instance identification mechanism, with no dependency on our NAT-based setup.
  • if promoting IPv6 to the prefered networking protocol inside CloudVPS, this would allow us to re-think our IPv4 NAT-based setup and reduce it to the bare minimum.
  • modern networking technology for our users, a step in the rigth direction from the technological point of view.

If we have a way to identify VM instances in the physical internal prod network by means of the IPv6 address, we no longer need our internal neutron customizations (dmz_cidr). Basically, we could leverage IPv6 to address the 2 constraint statements defined above.

Toolforge kubernetes

One of the main challengues of this proposal is to get kubernetes to work with IPv6, specifically the Toolforge Kubernetes cluster.

According to the kubernetes upstream documentation we need at least v1.16 to run kubernetes in dual stack IPv4/IPv6 mode (by the time of this writting we are using v1.15.6).

Some additional arguments are required for a bunch of kubernetes components, like the apiserver and the controller manager. Some of them can be specified when bootstrapping the cluster with kubeadm.

Another change required is to get kube-proxy running in ipvs mode (by the time of this writting we are using the iptables mode).

The documentation also notes that Service objects can be either IPv4 or IPv6 but not both at the same time. This means the webservice mechanism would need to create 2 services per tool, setting the .spec.ipFamily field accordingly in the definition (but IPv6 can be set as default).

Per the upstream docs, no special configuration is required to get nginx-ingress working in IPv6. No special changes (other than enable IPv6) should be required in either the tools front proxy or the kubernetes haproxy.

The calico docs for IPv6 contain detailed information on how to enable IPv6 for calico, which seems straigth forward. No special changes seems to be required to get coredns working on IPv6.

Summary:

  • kubernetes v1.16 is required.
  • kube-proxy ipv6 mode is required.
  • activate kubeadm, apiserver, controller manager, etc IPv6 support.
  • activate calico IPv6 support.
  • enable IPv6 in webservice-created Service objects.

Timeline

Proposed timeline:

  • 2020-xx-xx: design the IPv6 addressing plan for CloudVPS.
  • 2020-xx-xx: introduce backbone/transport IPv6 support in transport networs. Configure Neutron with basic IPv6 support. Early testing of the basic setup.
  • 2020-xx-xx: give the DNS setup support for IPv6.
  • 2020-xx-xx: additional review of networking policies, firewalling ACLs and other security aspects of the IPv6 setup.
  • 2020-xx-xx: work out IPv6 support for Toolforge Kubernetes and Toolforge in general.
  • 2020-xx-xx: work out IPv6 support in cloud supporting services, like NFS, LDAP, Wiki replicas, etc.
  • 2020-xx-xx: planificate and introduce IPv6 general availability.
  • 2020-xx-xx: planificate removal of IPv4 NAT dmz_cidr mechanism.

Other options

Other options that were considered (and discarded).

neutron address scopes

Neutron NAT.png

Neutron has a mechanism called address scopes which at first sight seems like the right way to replace our dmz_cidr mechanism. With address scopes you can instruct Neutron to do (or do not) NAT between certain subnets.

Using this mechanism, we would need to create an address scope (let's call it no-nat-address-scope) and then associate the internal instance virtual network subnet with a subnet-pool for it. The database can be hacked to associate an existing subnet with a new subnet pool.

This option was evaluated but several blockers were found:

  • the external networks for which we want to exclude NAT are actually external to neutron in the sense that neutron is not aware of them. We would need to declare them somehow inside neutron so they can be in the same scope (no-nat-address-scope) as the internal instance virtual network and no NAT is effectively applied. TODO: do more tests to see what happens here.
  • the scope mechanism works per neutron router interface and has nothing to do with addressing. The result is that we don't get the functionality we are looking for.

The dmz_cidr is not correctly implemented using address scopes. After configuring neutron this doesn't work as expected.

  • neutron configuration:
root@cloudcontrol2001-dev:~# openstack address scope list
+--------------------------------------+--------+------------+--------+---------+
| ID                                   | Name   | IP Version | Shared | Project |
+--------------------------------------+--------+------------+--------+---------+
| b8e9b95f-150e-4236-afba-8b1f3105e81c | no-nat |          4 | True   | admin   |
+--------------------------------------+--------+------------+--------+---------+
root@cloudcontrol2001-dev:~# openstack subnet pool list
+--------------------------------------+-------------------------+-----------------------------+
| ID                                   | Name                    | Prefixes                    |
+--------------------------------------+-------------------------+-----------------------------+
| 01476a5e-f23c-4bf3-9b16-4c2da858b59d | external-subnet-pools   | 10.0.0.0/8, 208.80.152.0/22 |
| d129650d-d4be-4fe1-b13e-6edb5565cb4a | cloud-instances2b-codfw | 172.16.128.0/24             |
+--------------------------------------+-------------------------+-----------------------------+
root@cloudcontrol2001-dev:~# openstack subnet pool show 01476a5e-f23c-4bf3-9b16-4c2da858b59d
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| address_scope_id  | b8e9b95f-150e-4236-afba-8b1f3105e81c | <---
| created_at        | 2020-02-11T13:45:20Z                 |
| default_prefixlen | 8                                    |
| default_quota     | 0                                    |
| description       | external networks with no NATs       |
| id                | 01476a5e-f23c-4bf3-9b16-4c2da858b59d |
| ip_version        | 4                                    |
| is_default        | False                                |
| max_prefixlen     | 32                                   |
| min_prefixlen     | 8                                    |
| name              | external-subnet-pools                |
| prefixes          | 10.0.0.0/8, 208.80.152.0/22          |
| project_id        | admin                                |
| revision_number   | 0                                    |
| shared            | False                                |
| tags              |                                      |
| updated_at        | 2020-02-11T13:45:20Z                 |
+-------------------+--------------------------------------+
root@cloudcontrol2001-dev:~# openstack subnet pool show d129650d-d4be-4fe1-b13e-6edb5565cb4a
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| address_scope_id  | b8e9b95f-150e-4236-afba-8b1f3105e81c | <---
| created_at        | 2020-02-11T16:59:02Z                 |
| default_prefixlen | 24                                   |
| default_quota     | None                                 |
| description       | main subnet pool                     |
| id                | d129650d-d4be-4fe1-b13e-6edb5565cb4a |
| ip_version        | 4                                    |
| is_default        | True                                 |
| max_prefixlen     | 32                                   |
| min_prefixlen     | 8                                    |
| name              | cloud-instances2b-codfw              |
| prefixes          | 172.16.128.0/24                      |
| project_id        | admin                                |
| revision_number   | 0                                    |
| shared            | True                                 |
| tags              |                                      |
| updated_at        | 2020-02-11T16:59:02Z                 |
+-------------------+--------------------------------------+
root@cloudcontrol2001-dev:~# openstack subnet list
+--------------------------------------+------------------------------------+--------------------------------------+-------------------+
| ID                                   | Name                               | Network                              | Subnet            |
+--------------------------------------+------------------------------------+--------------------------------------+-------------------+
| 31214392-9ca5-4256-bff5-1e19a35661de | cloud-instances-transport1-b-codfw | 57017d7c-3817-429a-8aa3-b028de82cdcc | 208.80.153.184/29 |
| 651250de-53ca-4487-97ce-e6f65dc4b8ec | HA subnet tenant admin             | d967e056-efc3-46f2-b75b-c906bb5322dc | 169.254.192.0/18  |
| 7adfcebe-b3d0-4315-92fe-e8365cc80668 | cloud-instances2-b-codfw           | 05a5494a-184f-4d5c-9e98-77ae61c56daa | 172.16.128.0/24   |
| b0a91a7b-2e0a-4e82-b0f0-7644f2cfa654 | cloud-codfw1dev-floating           | 57017d7c-3817-429a-8aa3-b028de82cdcc | 185.15.57.0/29    |
+--------------------------------------+------------------------------------+--------------------------------------+-------------------+
root@cloudcontrol2001-dev:~# openstack subnet show 7adfcebe-b3d0-4315-92fe-e8365cc80668
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| allocation_pools  | 172.16.128.10-172.16.128.250         |
| cidr              | 172.16.128.0/24                      |
| created_at        | 2018-03-16T21:41:08Z                 |
| description       |                                      |
| dns_nameservers   | 208.80.153.78                        |
| enable_dhcp       | True                                 |
| gateway_ip        | 172.16.128.1                         |
| host_routes       |                                      |
| id                | 7adfcebe-b3d0-4315-92fe-e8365cc80668 |
| ip_version        | 4                                    |
| ipv6_address_mode | None                                 |
| ipv6_ra_mode      | None                                 |
| name              | cloud-instances2-b-codfw             |
| network_id        | 05a5494a-184f-4d5c-9e98-77ae61c56daa |
| project_id        | admin                                |
| revision_number   | 1                                    |
| service_types     |                                      |
| subnetpool_id     | d129650d-d4be-4fe1-b13e-6edb5565cb4a | <---
| tags              |                                      |
| updated_at        | 2019-10-02T15:27:33Z                 |
+-------------------+--------------------------------------+
  • There shoud be no NAT between subnets in the same address scope, but ping test is wrong:
13:59:32.452843 IP cloudinstances2b-gw.openstack.codfw1dev.wikimediacloud.org > codfw1dev-recursor0.wikimedia.org: ICMP echo request, id 21081, seq 1, length 64
13:59:32.452883 IP codfw1dev-recursor0.wikimedia.org > cloudinstances2b-gw.openstack.codfw1dev.wikimediacloud.org: ICMP echo reply, id 21081, seq 1, length 64
  • ping test using the current dmz_cidr mechanism (the expected behaviour):
14:05:32.173816 IP 172.16.128.20 > codfw1dev-recursor0.wikimedia.org: ICMP echo request, id 21607, seq 6, length 64
14:05:32.173848 IP codfw1dev-recursor0.wikimedia.org > 172.16.128.20: ICMP echo reply, id 21607, seq 6, length 64

neutron direct connection to physical internal prod networks

Neutron NAT direct connection.png

Another option would be to give the neutron router a direct connection to the affected physical internal prod networks. This way, Neutron is fully aware of those networks (it has addressing in each VLAN/subnet) and since there is a direct route, no NAT needs to happen and VMs can connect directly preserving the source IP address.

This option has been discarded because it violates the clear separation constraint (see above).

See also

  • Neutron - documentation about our current Neutron setup