You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Network design - Eqiad WMCS Network Infra

From Wikitech-static
Revision as of 18:39, 31 May 2022 by imported>Cathal Mooney (→‎Logical Network)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page details the configuration of the network devices managed by SRE Infrastructure Foundations (netops) to support cloud services in the Eqiad (Equinix, Ashburn) datacenter. Further information on the overall WMCS networking setup, including elements managed by the WMCS team themselves, are on the Portal:Cloud VPS/Admin/Network page.

Physical Network

File:WMCS network-L1.png

The dedicated physical network currently consists of 4 racks of equipment, with 6 Juniper QFX-series switches deployed across the 4 racks. Additionally, rack C8 is connected to the virtual-chassis switches in row B, to provide connectivity for legacy servers installed there.

Racks C8 and D5 each have 2 switches, a main switch that connects servers and also has an uplink to one of the core routers, and a second switch which provides additional ports for servers. Most cloud hosts consume 2 switch ports, which means a single 48-port switch is not sufficient to connect all hosts in the racks, hence the second switch in each.

Racks E4 and F4 currently only have a single top-of-rack each, and it is hoped in time WMCS can adjust the server configs to use 801.1q / vlan tagging so separate physical ports are not required to connect to two or more networks.

The network is configured in a basic Spine/Leaf structure, with the switches in C8 and D5 acting as Spines, aggregating traffic from E4 and F4, and connecting to the outside world via the CR routers. Connections between racks E4/F4 and C8/D5 are 40G Ethernet (40GBase-LR) optical connections over single-mode fiber. The topology is not a perfect Spine/Leaf, however, as there is also a direct connection between cloudsw1-c8 and cloudsw1-d5. This is required for various reasons, principally that there is only a single uplink from each cloudsw to the CR routers, and an alternate path is needed in case of a link or CR failure.

Logical Network

Several networks are configured on the switches described in the last section. At a high-level networks are divided into the "cloud" and "production" realms, which are logically isolated from each other. This isolation is used to support the agreed Cross-Realm traffic guidelines.

Isolation is achieved through the use of Vlans and VRFs (routing-instances in JunOS) on the cloudsw devices. The default routing-instances on the cloudsw's is used for the production realm traffic, and a named routing-instance, 'cloud', is used for the cloud-realm.

Some networks exist purely at layer-2, with the switches only forwarding traffic between servers based on destination MAC address, remaining unaware of the IP addressing used on those layer-2 segments. Those networks only carry traffic internal to the cloud realm, and specific hosts, like cloudnet and cloudgw, act as the layer-3 routers for them. They are not technically part of the cloud vrf, as there are no IP interfaces belonging to them on the switches, but they are considered to be part of the cloud realm.

Production Realm

File:WMCS network-L3 - Prod Realm.drawio.png

Cloud VRF

The WMCS cloudnet hosts, implementing the OpenStack Neutron router, act as gateway for these networks.

File:WMCS network-L3 - Cloud VRF Realm.drawio.png

Notes