You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Portal:Toolforge/Admin/Kubernetes/Pod tracing: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Arturo Borrero Gonzalez
(→‎Case 1: Unknown Tool hammering an API: you know the node already!)
 
imported>Nintendofan885
(fix spacing)
 
Line 17: Line 17:
<syntaxhighlight lang="shell-session">
<syntaxhighlight lang="shell-session">
aborrero@tools-k8s-master-01:~$ kubectl get pod -o wide --all-namespaces | grep tools-worker-1021
aborrero@tools-k8s-master-01:~$ kubectl get pod -o wide --all-namespaces | grep tools-worker-1021
base-php-cli                      interactive                                        1/1      Running            0          25d      192.168.165.3    tools-worker-1021.tools.eqiad.wmflabs
base-php-cli                      interactive                                        1/1      Running            0          25d      192.168.165.3    tools-worker-1021.tools.eqiad1.wikimedia.cloud
citations                          interactive                                        1/1      Running            0          53d      192.168.165.8    tools-worker-1021.tools.eqiad.wmflabs
citations                          interactive                                        1/1      Running            0          53d      192.168.165.8    tools-worker-1021.tools.eqiad1.wikimedia.cloud
fireflytools                      fireflytools-3361574769-010jg                      1/1      Running            0          53d      192.168.165.2    tools-worker-1021.tools.eqiad.wmflabs
fireflytools                      fireflytools-3361574769-010jg                      1/1      Running            0          53d      192.168.165.2    tools-worker-1021.tools.eqiad1.wikimedia.cloud
intuition                          interactive                                        1/1      Running            0          23d      192.168.165.7    tools-worker-1021.tools.eqiad.wmflabs
intuition                          interactive                                        1/1      Running            0          23d      192.168.165.7    tools-worker-1021.tools.eqiad1.wikimedia.cloud
magog                              magog-3723068317-3i4x9                              1/1      Running            0          15d      192.168.165.14  tools-worker-1021.tools.eqiad.wmflabs
magog                              magog-3723068317-3i4x9                              1/1      Running            0          15d      192.168.165.14  tools-worker-1021.tools.eqiad1.wikimedia.cloud
ordia                              ordia-2331477560-stoeq                              1/1      Running            0          15d      192.168.165.10  tools-worker-1021.tools.eqiad.wmflabs
ordia                              ordia-2331477560-stoeq                              1/1      Running            0          15d      192.168.165.10  tools-worker-1021.tools.eqiad1.wikimedia.cloud
ores-support-checklist            ores-support-checklist-1319410377-t54a7            1/1      Running            0          53d      192.168.165.9    tools-worker-1021.tools.eqiad.wmflabs
ores-support-checklist            ores-support-checklist-1319410377-t54a7            1/1      Running            0          53d      192.168.165.9    tools-worker-1021.tools.eqiad1.wikimedia.cloud
phpinfo                            phpinfo-2217838168-rm5ui                            1/1      Running            0          21d      192.168.165.13  tools-worker-1021.tools.eqiad.wmflabs
phpinfo                            phpinfo-2217838168-rm5ui                            1/1      Running            0          21d      192.168.165.13  tools-worker-1021.tools.eqiad1.wikimedia.cloud
proxies                            proxies-1745127721-f9yjq                            1/1      Running            0          44d      192.168.165.11  tools-worker-1021.tools.eqiad.wmflabs
proxies                            proxies-1745127721-f9yjq                            1/1      Running            0          44d      192.168.165.11  tools-worker-1021.tools.eqiad1.wikimedia.cloud
strephit                          strephit-1006144150-x7q69                          1/1      Running            0          77d      192.168.165.4    tools-worker-1021.tools.eqiad.wmflabs
strephit                          strephit-1006144150-x7q69                          1/1      Running            0          77d      192.168.165.4    tools-worker-1021.tools.eqiad1.wikimedia.cloud
topicmatcher                      topicmatcher-187403292-suxnb                        1/1      Running            0          73d      192.168.165.5    tools-worker-1021.tools.eqiad.wmflabs
topicmatcher                      topicmatcher-187403292-suxnb                        1/1      Running            0          73d      192.168.165.5    tools-worker-1021.tools.eqiad1.wikimedia.cloud
verification-pages                verification-pages-3591681152-nfeew                1/1      Running            0          22h      192.168.165.15  tools-worker-1021.tools.eqiad.wmflabs
verification-pages                verification-pages-3591681152-nfeew                1/1      Running            0          22h      192.168.165.15  tools-worker-1021.tools.eqiad1.wikimedia.cloud
w-slackbot                        w-slackbot-3270543702-ljgri                        1/1      Running            0          14d      192.168.165.12  tools-worker-1021.tools.eqiad.wmflabs
w-slackbot                        w-slackbot-3270543702-ljgri                        1/1      Running            0          14d      192.168.165.12  tools-worker-1021.tools.eqiad1.wikimedia.cloud
</syntaxhighlight>
</syntaxhighlight>
{{Collapse bottom}}
{{Collapse bottom}}

Latest revision as of 17:55, 29 September 2020

This article describes some procedures in cases we need to trace a pod when is misbehaving. Typical case is an API being hammered by an unknown tool in Toolforge, and the need to shutdown the corresponding pod.

In all cases, after you have identified the offending tool/pod, you can disable it as described at Help:Toolforge/Kubernetes#Monitoring_your_job (i.e., become the tool in a bastion, and then delete the deployment).

Case 1: Unknown Tool hammering an API

This case happened before. See T204267 for example.

In this example, the Wikidata API was being hammered by an IP from a k8s worker node in Toolforge. The tool was using an not-meaningfull User-Agent, so we had no way of identifying quickly which tool was causing it.

First, get an overview of pods running in the offending k8s node (you should know the k8s worker node becase that would be present in the API server logs):

Try to see at first glance if a tool is the obvious suspicious from causing the high traffic. If not, try running tcpdump in the k8s node to try to see and identify some traffic pattern which allows you to match the traffic to a given internal k8s IP address (i.e, 192.168.x.x). You can also inspect other resources, like conntrack -L which maintains a list of current NAT connections and like iptables-save, which contains the matching between internal k8s IP addresses and tool names (rules comments).