You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Dell Enterprise Sonic Evaluation: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Cathal Mooney
imported>Cathal Mooney
Line 17: Line 17:
===Ecosystem===
===Ecosystem===


Another caveat is that a small number of ASIC vendors, notably Broadcom, have the switching market carved up.  These vendors often gate access to their designs and SDKs, limiting the scope for independent parties to create software for them.  In one famous case Broadcom ceased licensing its SDK to Cumulus Networks, after they were acquired by rival hardware manufacturer nVidia.  This left some customers forced to choose another hardware supplier or move to another OS when they had to upgrade.  The reality right now is that it is not possible to produce an operating system for switching hardware without permission from the ASIC vendors.
Another caveat is that a small number of ASIC vendors, notably Broadcom, have the switching market carved up.  These vendors often gate access to their designs and SDKs, limiting the scope for independent parties to create software for them.  In one famous case Broadcom ceased licensing its SDK to Cumulus Networks, after they were acquired by rival hardware manufacturer nVidia.  This left some customers forced to choose another hardware supplier or move to another OS when they had to upgrade.  So the reality right now is that it is not possible to produce an operating system for switching hardware without permission from the ASIC vendors.


Nevertheless the space has opened up and there are several "white box" NOS's available, even if things won't ever be as open as server hardware.  Options include commercial offerings such as [https://www.pica8.com/picos-software/ PicOS], [https://www.arrcus.com/connected-edge/arcos/ ArcOS] and [https://www.ipinfusion.com/products/ocnos/?2#odc OcNOS], as well as open-source projects such as [https://www.danosproject.org/ DANOS] and [https://www.openswitch.net/ OpenSwitch].  Of the open-source options [https://en.wikipedia.org/wiki/SONiC_(operating_system) SONiC], initially created by Microsoft and now with contributions from many others, has become a definite [https://www.nextplatform.com/2020/05/12/is-microsofts-sonic-winning-the-war-of-the-noses/ leader in the space].
Nevertheless the space has opened up and there are several "white box" NOS's available, even if things won't ever be as open as for server hardware.  Options include commercial offerings such as [https://www.pica8.com/picos-software/ PicOS], [https://www.arrcus.com/connected-edge/arcos/ ArcOS] and [https://www.ipinfusion.com/products/ocnos/?2#odc OcNOS], as well as open-source projects such as [https://www.danosproject.org/ DANOS] and [https://www.openswitch.net/ OpenSwitch].


=Dell Enterprise SONiC=
=SONiC=


Of the various open-source options SONiC is one of the most popular, with significant industry support.  Significantly Broadcom have been supportive of the project and have [https://github.com/Broadcom-Switch/SAI released].
Of the various open-source options [https://en.wikipedia.org/wiki/SONiC_(operating_system) SONiC] has become one of the [https://www.nextplatform.com/2020/05/12/is-microsofts-sonic-winning-the-war-of-the-noses/ most popular], with significant industry support.  Initially developed by Microsoft to power their Azure cloud service, it has since been open-sourced and become part of the [https://www.opencompute.org/projects/networking OCP Networking Project], with software development stewarded by the [https://www.linuxfoundation.org/press-release/software-for-open-networking-in-the-cloud-sonic-moves-to-the-linux-foundation/ Linux Foundation]
 
It leverages the [https://www.design-reuse.com/articles/44519/switch-abstraction-interface-sai.html Switch Abstraction Interface] (SAI) also defined by OCP, to communicate with switching silicon.  Significantly Broadcom has contributed a lot to its development, providing an [https://github.com/Broadcom-Switch/SAI SAI implementation] for thier ASICs and also committing to continued support for future silicon they develop. 
 
==Architecture==
 
SONiC is based on Debian Linux, with the SAI added to provide an interface to the switch hardware.  This makes it very easy to get to grips with for SREs who are already familiar with Debian.  It is a modular distro in which networking applications (e.g., FRR, LDDP, LACP, NAT etc) run independently in dedicated Docker containers, which each use Redis as an information source to share configuration and state info.
 
[[File:Sonic_architecture.png|700px|link=https://github.com/sonic-net/SONiC/wiki/Architecture]]
 
The modular Linux-based nature make it easy for new applications to be developed or added to the platform, as well as for common Linux automation tooling to be leveraged.  It can, for instance, run a standard puppet agent installed from upstream Debian repos, or a Prometheus Node Exporter.  It ships with various containerized daemons to provide functionality, most notably employing [https://frrouting.org/ FRRouting] for routing protocols such as OSPF/BGP.  While each of these sub-components have their own configuration files and syntax, and various [https://github.com/sonic-net/SONiC/wiki/SONiC-YANG-Subgroup YANG models] are defined for specific configuration elements, there is inconsistent coverage between the various ways to configure devices.  More recently the [https://github.com/sonic-net/SONiC/blob/master/doc/mgmt/Management%20Framework.md Management Framework] has been introduced to provide a unified way to configure all these elements.  It offers an "industry standard" (i.e. Cisco-like) CLI, as well as REST and gRPC endpoints for the current set of support YANG models.
 
It supports a dedicated mnagement VRF for connecting a devices management-only network interface.  SSH is supported as one would expect, and it supports the standard SNMP MIBs any other Debian system would.  Redis is the ultimate store of the full configuration for all elements, and the DB is written to /etc/sonic/config_db. json for persistence.  Network state is synced to the Linux kernel, so standard Linux command line interfaces such as [https://wiki.linuxfoundation.org/networking/iproute2 iproute2] can be used to view state.  Using such tools to modify state is highly discouraged.
 
==SONiC Support==
 
SONiC's open source nature is in stark contrast with the more traditional network operating systems, which are provided with hardware and software support from the vendor.  If you are running SONiC there is no TAC to contact to get assistance if something does not work as expected, or assistance is needed.  Certainly if there is a hardware fault with a device you can go back to the HW vendor for replacement, but outside that users are on their own.
 
Unlike perhaps the situation with server/x86 based platforms, there is a fairly small install-base of SONiC users.  This means community support is limited.  Many SONiC users, like Microsoft, LinkedIn or Ali Baba, operate at massive scale, contribute to SONiC themselves, and have staff internally who can provide support, bug fixes, diagnostics etc.  For smaller enterprises, however, the lack of any support or sufficient internal resources to deal with problems is a big issue.  Smaller outfits often also require more or different features than the web-scalers, which SONiC lacked in the early days.
 
This situation has made some smaller enterprises wary of moving to SONiC, forgoing the support they're accustomed to from their existing vendors.  While Juniper support has not been stellar in recent years, '''WMF netops are broadly of the opinion that moving to a completely unsupported new platform would represent an unacceptable risk.'''
 
===Dell Enterprise SONiC===
 
Dell have been producing network devices for several years now.  Anecdotally it is common to hear less than favorable opinions from network engineers about the Force-10 OS they ship with these.  So perhaps it is not surprising that Dell have decided to offer SONiC as an OS option for some of their switches, and bridge the support and feature gap to make it more attractive to small and medium sized enterprises.
 
[https://www.dell.com/en-us/dt/networking/sonic/index.htm Dell Enterprise Sonic] is the result.  This initiative has seen them become one of the largest contributors to the SONiC project over the past few years.  They offer two variants of the OS, standard and premium.  Standard is their build of the upstream open-source project, built and released on a regular schedule.  It may contain Dell contributions not yet merged into the upstream project, but does not contain any closed source elements.  The premium variant offers more rich analytics and features, such as [https://blog.sflow.com/2020/10/broadcom-mirror-on-drop-mod.html Mirror on Drop] and [https://datatracker.ietf.org/doc/html/draft-kumar-ippm-ifa-04 Inband Flow Analysis].  It may also contain closed source features that won't be upstreamed to the open source release.
 
In terms of WMF requirements and longer-term direction the standard build covers our requirements.  Each version is available in either a "cloud bundle" or "enterprise bundle".  The enterprise bundle is required by WMF, supporting VXLAN/EVPN which is not available in the cloud offering.
 
==Dell Network Switches==
 
Dell Enterprise Sonic runs on only a small subset of the network devices they produce, namely those based on the Broadcom Trident 3 ASIC (similar to Juniper QFX5120 series).
 
Initially spurred by a desire to explore more open networking platforms, and then by concerns about cost and lead-time for Juniper equipment, SRE Netops arranged with Dell to get some network devices on test. Specifically they delivered 2 of each of these models:
 
{| class="wikitable"
|-
! Model !! Description !! Juniper Equivalent
|-
| Dell S5248F-ON || 48xSFP28 + 6xQSFP28 Top-of-Rack / Leaf Switch || QFX5120-48Y
|-
| Dell S5232F-ON || 32xQSFP28 Aggregation / Spine Switch || QFX5120-32C
|}


=Test Criteria=
=Test Criteria=

Revision as of 16:19, 5 August 2022

Background

For many years Wikimedia have used Juniper equipment for all networking requirements (currently edge/WAN routers, datacenter switches, management firewalls). While we are broadly happy with Juniper, it is also imperative to assess alternatives, ensuring the foundation gets value for money and the best performance possible.

Foundation Costs

Recent years have seen the cost of datacenter switches in particular increasing. This has partially been driven by a gradual move from 1G to faster connections to end-hosts, with the newer equipment supporting 10G+ speeds being pricier. But there have also been increased costs for software licenses, which in the past were part of the 'base' system, pushing up overall costs. The supply-chain / chip shortage problems that emerged from 2020 onwards have only accelerated this trend.

Open Source

JunOS, Juniper's operating system, stands out in the foundation as one of the largest closed-source / proprietary software systems in use. In many respects this is standard for network devices. These typically use custom ASICs for packet forwarding, and are not based on the largely open x86/amd64 architecture which server operating systems target. The specialized and proprietary nature of such hardware has seen vendors typically offering "vertically integrated" software/hardware stacks since the dawn of the industry.

White Box

In more recent years there has been some movement away from this. Driven initially by the large web-scalers, disaggregated or white box switching has risen to prominence. In this model the switching hardware is provided by one company, and the operating-system is sourced elsewhere (much like one buys a Dell server and runs Debian or Windows on it without consulting Dell). Such an approach offers many advantages, such as being able to change vendors but keep the same operating system. Or change the OS in use on existing hardware. "White box" switch hardware is typically available for a substantially lower cost than brand-name alternatives. There can be drawbacks, however, such as not having a "one stop shop" for support.

Ecosystem

Another caveat is that a small number of ASIC vendors, notably Broadcom, have the switching market carved up. These vendors often gate access to their designs and SDKs, limiting the scope for independent parties to create software for them. In one famous case Broadcom ceased licensing its SDK to Cumulus Networks, after they were acquired by rival hardware manufacturer nVidia. This left some customers forced to choose another hardware supplier or move to another OS when they had to upgrade. So the reality right now is that it is not possible to produce an operating system for switching hardware without permission from the ASIC vendors.

Nevertheless the space has opened up and there are several "white box" NOS's available, even if things won't ever be as open as for server hardware. Options include commercial offerings such as PicOS, ArcOS and OcNOS, as well as open-source projects such as DANOS and OpenSwitch.

SONiC

Of the various open-source options SONiC has become one of the most popular, with significant industry support. Initially developed by Microsoft to power their Azure cloud service, it has since been open-sourced and become part of the OCP Networking Project, with software development stewarded by the Linux Foundation

It leverages the Switch Abstraction Interface (SAI) also defined by OCP, to communicate with switching silicon. Significantly Broadcom has contributed a lot to its development, providing an SAI implementation for thier ASICs and also committing to continued support for future silicon they develop.

Architecture

SONiC is based on Debian Linux, with the SAI added to provide an interface to the switch hardware. This makes it very easy to get to grips with for SREs who are already familiar with Debian. It is a modular distro in which networking applications (e.g., FRR, LDDP, LACP, NAT etc) run independently in dedicated Docker containers, which each use Redis as an information source to share configuration and state info.

File:Sonic architecture.png

The modular Linux-based nature make it easy for new applications to be developed or added to the platform, as well as for common Linux automation tooling to be leveraged. It can, for instance, run a standard puppet agent installed from upstream Debian repos, or a Prometheus Node Exporter. It ships with various containerized daemons to provide functionality, most notably employing FRRouting for routing protocols such as OSPF/BGP. While each of these sub-components have their own configuration files and syntax, and various YANG models are defined for specific configuration elements, there is inconsistent coverage between the various ways to configure devices. More recently the Management Framework has been introduced to provide a unified way to configure all these elements. It offers an "industry standard" (i.e. Cisco-like) CLI, as well as REST and gRPC endpoints for the current set of support YANG models.

It supports a dedicated mnagement VRF for connecting a devices management-only network interface. SSH is supported as one would expect, and it supports the standard SNMP MIBs any other Debian system would. Redis is the ultimate store of the full configuration for all elements, and the DB is written to /etc/sonic/config_db. json for persistence. Network state is synced to the Linux kernel, so standard Linux command line interfaces such as iproute2 can be used to view state. Using such tools to modify state is highly discouraged.

SONiC Support

SONiC's open source nature is in stark contrast with the more traditional network operating systems, which are provided with hardware and software support from the vendor. If you are running SONiC there is no TAC to contact to get assistance if something does not work as expected, or assistance is needed. Certainly if there is a hardware fault with a device you can go back to the HW vendor for replacement, but outside that users are on their own.

Unlike perhaps the situation with server/x86 based platforms, there is a fairly small install-base of SONiC users. This means community support is limited. Many SONiC users, like Microsoft, LinkedIn or Ali Baba, operate at massive scale, contribute to SONiC themselves, and have staff internally who can provide support, bug fixes, diagnostics etc. For smaller enterprises, however, the lack of any support or sufficient internal resources to deal with problems is a big issue. Smaller outfits often also require more or different features than the web-scalers, which SONiC lacked in the early days.

This situation has made some smaller enterprises wary of moving to SONiC, forgoing the support they're accustomed to from their existing vendors. While Juniper support has not been stellar in recent years, WMF netops are broadly of the opinion that moving to a completely unsupported new platform would represent an unacceptable risk.

Dell Enterprise SONiC

Dell have been producing network devices for several years now. Anecdotally it is common to hear less than favorable opinions from network engineers about the Force-10 OS they ship with these. So perhaps it is not surprising that Dell have decided to offer SONiC as an OS option for some of their switches, and bridge the support and feature gap to make it more attractive to small and medium sized enterprises.

Dell Enterprise Sonic is the result. This initiative has seen them become one of the largest contributors to the SONiC project over the past few years. They offer two variants of the OS, standard and premium. Standard is their build of the upstream open-source project, built and released on a regular schedule. It may contain Dell contributions not yet merged into the upstream project, but does not contain any closed source elements. The premium variant offers more rich analytics and features, such as Mirror on Drop and Inband Flow Analysis. It may also contain closed source features that won't be upstreamed to the open source release.

In terms of WMF requirements and longer-term direction the standard build covers our requirements. Each version is available in either a "cloud bundle" or "enterprise bundle". The enterprise bundle is required by WMF, supporting VXLAN/EVPN which is not available in the cloud offering.

Dell Network Switches

Dell Enterprise Sonic runs on only a small subset of the network devices they produce, namely those based on the Broadcom Trident 3 ASIC (similar to Juniper QFX5120 series).

Initially spurred by a desire to explore more open networking platforms, and then by concerns about cost and lead-time for Juniper equipment, SRE Netops arranged with Dell to get some network devices on test. Specifically they delivered 2 of each of these models:

Model Description Juniper Equivalent
Dell S5248F-ON 48xSFP28 + 6xQSFP28 Top-of-Rack / Leaf Switch QFX5120-48Y
Dell S5232F-ON 32xQSFP28 Aggregation / Spine Switch QFX5120-32C

Test Criteria

Test Results

Conclusions

Pros

Cons

Costs

Verdict