You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Kubernetes/Clusters"

From Wikitech-static
Jump to navigation Jump to search
imported>Legoktm
(→‎staging: TLS is auto-setup)
 
imported>Alexandros Kosiaris
(Add the New cluster guide)
(One intermediate revision by one other user not shown)
Line 3: Line 3:
We have multiple Kubernetes clusters deployed in "production".
We have multiple Kubernetes clusters deployed in "production".


== eqiad & codfw ==
== main/services (eqiad & codfw) ==
<code>eqiad</code> and <code>codfw</code> are the primary Kubernetes clusters and serve real traffic. It is expected that most services are deployed identically in both in an active/active fashion.
<code>eqiad</code> and <code>codfw</code> are the primary Kubernetes clusters and serve real traffic. It is expected that most services are deployed identically in both in an active/active fashion. These are our older kubernetes clusters and have the historical benefit of using the DC names in short form. They are also known as '''main/services'''


== staging ==
== staging ==
Line 11: Line 11:
In addition, TLS is automatically configured for all services deployed here.
In addition, TLS is automatically configured for all services deployed here.


== staging-codfw ==
<code>staging-codfw</code> is intended for SREs to adjust and test the configuration of Kubernetes itself. While developers can deploy there, it's strongly discouraged. The cluster is in a constant rate of change.
<code>staging-codfw</code> is intended for SREs to adjust and test the configuration of Kubernetes itself.
 
== ml-serve-eqiad & ml-serve-codfw ==
ml-serve clusters run the Kubeflow Kfserving stack and they are aimed (as first goal) to replace the ORES infrastructure that serves revision scores. These are mostly managed by the ML team, they are sharing however greatly the infrastructure the main/services clusters have.
 
== Creating a new cluster ==
Creating a new cluster is supported, albeit is a substantial amount of work. SREs should definitely consult with the Service Operations team before proceeding further with the instantiation of a new cluster. Docs are at [[Kubernetes/Clusters/New|Kubernetes/Cluster/New]]

Revision as of 14:31, 29 September 2021

We have multiple Kubernetes clusters deployed in "production".

main/services (eqiad & codfw)

eqiad and codfw are the primary Kubernetes clusters and serve real traffic. It is expected that most services are deployed identically in both in an active/active fashion. These are our older kubernetes clusters and have the historical benefit of using the DC names in short form. They are also known as main/services

staging

staging, also known as staging-eqiad, allows developers to deploy and test new versions of their project without affecting user traffic. Typically deployments will only have 1 replica in staging since it has less resources.

In addition, TLS is automatically configured for all services deployed here.

staging-codfw is intended for SREs to adjust and test the configuration of Kubernetes itself. While developers can deploy there, it's strongly discouraged. The cluster is in a constant rate of change.

ml-serve-eqiad & ml-serve-codfw

ml-serve clusters run the Kubeflow Kfserving stack and they are aimed (as first goal) to replace the ORES infrastructure that serves revision scores. These are mostly managed by the ML team, they are sharing however greatly the infrastructure the main/services clusters have.

Creating a new cluster

Creating a new cluster is supported, albeit is a substantial amount of work. SREs should definitely consult with the Service Operations team before proceeding further with the instantiation of a new cluster. Docs are at Kubernetes/Cluster/New