You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Kubernetes/Troubleshooting"

From Wikitech-static
Jump to navigation Jump to search
imported>Quiddity
(c/e and tweaks)
 
imported>BryanDavis
(s/WMCS/Toolforge/)
 
Line 6: Line 6:




The following troubleshooting flowchart should give a rough guideline on how to find errors in Kubernetes service deployments. The flowchart is intended for the production Kubernetes platform, not [[WMCS]].
The following troubleshooting flowchart should give a rough guideline on how to find errors in Kubernetes service deployments. The flowchart is intended for the production Kubernetes platform, not [[Portal:Toolforge|Toolforge]].
[[File:Kubernetes Troubleshooting WMF.png|thumb|left|638x638px|Production Kubernetes troubleshooting flowchart]]
[[File:Kubernetes Troubleshooting WMF.png|thumb|left|638x638px|Production Kubernetes troubleshooting flowchart]]



Latest revision as of 20:19, 24 September 2021

Kubernetes has a lot of moving parts and offers wide configuration options. So misconfiguration can happen. This page should help to troubleshoot some error cases.


Most of the time, errors in deployments are caught by helmfile. If a new deployment is unable to become ready, helmfile will roll back automatically after a timeout (currently 300 seconds). If your deployment takes a long time and fails after a timeout, then it is helpful to start another SSH session and troubleshoot the service components. Make sure to make yourself familiar with the usage of kubectl.


The following troubleshooting flowchart should give a rough guideline on how to find errors in Kubernetes service deployments. The flowchart is intended for the production Kubernetes platform, not Toolforge.

File:Kubernetes Troubleshooting WMF.png
Production Kubernetes troubleshooting flowchart

Additional Resources