You are browsing a read-only backup copy of Wikitech. The primary site can be found at


From Wikitech-static
Jump to navigation Jump to search

We have multiple Kubernetes clusters deployed in "production".

main/services (eqiad & codfw)

eqiad and codfw are the primary Kubernetes clusters and serve real traffic. It is expected that most services are deployed identically in both in an active/active fashion. These are our older kubernetes clusters and have the historical benefit of using the DC names in short form. They are also known as main/services


staging, also known as staging-eqiad, allows developers to deploy and test new versions of their project without affecting user traffic. Typically deployments will only have 1 replica in staging since it has less resources.

In addition, TLS is automatically configured for all services deployed here.

staging-codfw is intended for SREs to adjust and test the configuration of Kubernetes itself. While developers can deploy there, it's strongly discouraged. The cluster is in a constant rate of change.

ml-serve-eqiad & ml-serve-codfw

ml-serve clusters run the Kubeflow Kfserving stack and they are aimed (as first goal) to replace the ORES infrastructure that serves revision scores. These are mostly managed by the ML team, they are sharing however greatly the infrastructure the main/services clusters have.

Creating a new cluster

Creating a new cluster is supported, albeit is a substantial amount of work. SREs should definitely consult with the Service Operations team before proceeding further with the instantiation of a new cluster. Docs are at Kubernetes/Cluster/New