You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

User:Jbond/debuging: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Jbond
imported>Jbond
(Jbond moved page User:Jbond/debuging to User:Jbond/debugging: bad spelling)
 
Line 1: Line 1:
= USE =
#REDIRECT [[User:Jbond/debugging]]
 
http://www.brendangregg.com/USEmethod/use-linux.html
 
= Logs =
https://wikitech.wikimedia.org/wiki/Logs
 
= Network =
https://wikitech.wikimedia.org/wiki/Network_cheat_sheet#Juniper
 
= Sampled-1000.json on centrallog1001 =
 
=== nice summary ===
tail -f sampled-1000.json | /home/legoktm/webreq-filter
 
=== Grep-able oputput ===
<syntaxhighlight lang=console>
$ jq  -r "[.uri_path,.hostname,.user_agent,.ip] | @csv" /srv/log/webrequest/sampled-1000.json
</syntaxhighlight>
 
=== Selecte all public_cloud nets with 429 ===
<syntaxhighlight lang=console>
$ tail -n10000 /srv/weblog/webrequest/sampled-1000.json | jq -r 'select(.http_status == "429") | select(.x_analytics | contains("public_cloud=1"))'
</syntaxhighlight>
 
=== Select all requests with a specific user_agent and .referer ===
 
<syntaxhighlight lang=console>
$ jq -r 'if .user_agent == "-" and .referer == "-" then [.uri_path,.hostname,.user_agent,.ip] else empty end | @csv' /srv/log/webrequest/sampled-1000.json
</syntaxhighlight>
 
=== List of the top 10 IPs by response size ===
 
<syntaxhighlight lang=console>
$ head -n 2560000 /srv/log/webrequest/sampled-1000.json | jq -r '.ip + " " + (.response_size | tostring)' | awk '{ sum[$1] += $2 } END { for (ip in sum) print sum[ip],ip }' | sort -nr | head -10
</syntaxhighlight>
 
= 5xx.json =
=== Grepable ===
<syntaxhighlight source="console">
$ tail -f  /srv/log/webrequest/5xx.json | jq "[.uri_host, .uri_path, .uri_query, .http_method, .ip, .user_agent] | @csv"
</syntaxhighlight>
 
= mw server =
 
=== list all ips which have made more the 100 large requests ===
 
<syntaxhighlight lang=console>
$ awk '$2>60000 {print $11}' /var/log/apache2/other_vhosts_access.log | sort | uniq -c | awk '$1>100 {print}'
</syntaxhighlight>
 
=== MediaWiki Shell ===
<syntaxhighlight lang=console>
$ ssh mwmaint1002
$ mwscript maintenance/shell.php --wiki=enwiki
</syntaxhighlight>
Then
<syntaxhighlight lang=php>
>>> var_dump($wgUpdateRowsPerQuery);
int(100)
=> null
>>>
</syntaxhighlight>
=== One of purge ===
On mwmaint1002, run:
<syntaxhighlight lang=shell>
$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php
</syntaxhighlight>
re: https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging#One-off_purge
 
= LVS Server =
 
=== Sample 100k pkts and list top talkers ===
 
<syntaxhighlight lang=console>
$ sudo tcpdump -i enp4s0f0 -pn -c 100000 | sed -r 's/.* IP6? //;s/\.[^\.]+ .*//' | sort | uniq -c | sort -nr | head -20
</syntaxhighlight>
 
=== Testig a site agains a specific lvs ===
 
<syntaxhighlight lang=console>
$ curl --connect-to "::text-lb.${site}.wikimedia.org" https://en.wikipedia.org/wiki/Main_Page?x=$RANDOM
</syntaxhighlight>
 
= CP Server =
 
=== Query for specific status code ===
 
<syntaxhighlight lang=console>
$ sudo varnishncsa -n frontend -g request -q 'RespStatus eq 429' 
</syntaxhighlight>
 
Custom format with client IP address
<syntaxhighlight lang=console>
$ sudo -i varnishncsa -n frontend -g request -q 'RespStatus eq 429' -F '%{X-Client-IP}i %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{X-Forwarded-Proto}i\""'
</syntaxhighlight>
 
Or the much more verbos version
<syntaxhighlight lang=console>
$ sudo varnishlog -n frontend -g request -q 'RespStatus eq 429'
</syntaxhighlight>
 
=== Check the connection tuples for the varnish ===
 
<syntaxhighlight lang=console>
$ sudo ss -tan 'sport = :3120' | awk '{print $(NF)" "$(NF-1)}' | sed 's/:[^ ]*//g' | sort | uniq -c
</syntaxhighlight>
 
The number of avaible ports which also maps to tuples is available from if the number above is equal to approaching the number of available ports from below then there could ba en issue
 
<syntaxhighlight lang=console>
$ cat /proc/sys/net/ipv4/ip_local_port_range
</syntaxhighlight>
 
=== Checking sites from CP server ===
You can use curl from the cp serveres to ensure you fiut the front end/back end cache and for it to hit fetch a specific site with the following commands
 
Using <code>$RANDOM</code> below prevents us from hitting the cache
 
'''frontend'''
<syntaxhighlight lang=console>
$ curl --connect-to "::$HOSTNAME" https://en.wikipedia.org/wiki/Main_Page?x=$RANDOM
</syntaxhighlight>
 
'''backend'''
<syntaxhighlight lang=console>
$ curl --connect-to "::$HOSTNAME:3128"  -H "X-Forwarded-Proto: https"" https://en.wikipedia.org/wiki/Main_Page?x=$RANDOM
</syntaxhighlight>
 
= Proxed web service =
 
=== Show all request and response headeres on loopback ===
 
<syntaxhighlight lang=console>
$ sudo stdbuf -oL -eL /usr/sbin/tcpdump -Ai lo -s 10240 "tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)" | egrep -a --line-buffered ".+(GET |HTTP\/|POST )|^[A-Za-z0-9-]+: " | perl -nle 'BEGIN{$|=1} { s/.*?(GET |HTTP\/[0-9.]* |POST )/\n$1/g; print }'
</syntaxhighlight>
re: https://serverfault.com/a/633452/464916
 
==== show full body ====
 
<syntaxhighlight lang=console>
$ sudo stdbuf -oL -eL /usr/sbin/tcpdump -Ai lo -s 10240 "tcp port 8001 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)"
</syntaxhighlight>
 
= Pooling =
 
== Check the pooled state ==
 
'''Servcie'''
<syntaxhighlight lang=console>
$ confctl select service=thumbor get 
</syntaxhighlight>
'''host'''
<syntaxhighlight lang=console>
$ confctl select dc=eqiad,cluster=cache_text,service=varnish-be,name=cp1052.eqiad.wmnet get
</syntaxhighlight>
 
== Depooling ==
https://wikitech.wikimedia.org/wiki/Depooling_servers
 
== pybal ==
Check log files /var/log/pybal.log on lvs servers
 
= Postgresql =
 
== display locks ==
<syntaxhighlight lang=sql>
SELECT a.datname,
        l.relation::regclass,
        l.transactionid,
        a.query,
        age(now(), a.query_start) AS "age",
        a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;
</syntaxhighlight>
 
== show blocked by waiting on lock ==
<syntaxhighlight lang=sql>
SELECT blocked_locks.pid    AS blocked_pid,
        blocked_activity.usename  AS blocked_user,
        blocking_locks.pid    AS blocking_pid,
        blocking_activity.usename AS blocking_user,
        blocked_activity.query    AS blocked_statement,
        blocking_activity.query  AS current_statement_in_blocking_process
  FROM  pg_catalog.pg_locks        blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks        blocking_locks
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
  WHERE NOT blocked_locks.granted;
</syntaxhighlight>
 
== get table sizes ==
<syntaxhighlight lang=sql>
SELECT nspname || '.' || relname AS "relation",
      pg_size_pretty(pg_relation_size(C.oid)) AS "disk size",
      pg_size_pretty( pg_total_relation_size(nspname || '.' || relname)) AS "size"
    FROM pg_class C
    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname IN ('public')
    ORDER BY pg_relation_size(C.oid) DESC;
</syntaxhighlight>
 
= DHCPd =
 
Use the following to capture DHCP traffic regarding a specific client mac.  in the following the mac address was <code>aa:00:00:d9:81:8a</code>.  We just use the last 4 bytes (00:d9:81:8a) in the filter below
<syntaxhighlight lang=console>
$ sudo tcpdump -i ens5 -vvv -s 1500 '((port 67 or port 68) and (udp[38:4] = 0x00d9818a))'
</syntaxhighlight>
 
== iPXE cli ==
While booting press ctrl+b to drop you into the [https://ipxe.org/err/2d03e1 iPXE shell].  you may be required to use the advanced [[Ganeti#Get_a_console_for_a_VM|console connections options]]

Latest revision as of 15:26, 10 June 2022