<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Debugging on Besterry — Linux &amp; DevOps Notes</title><link>https://besterry.com/tags/debugging/</link><description>Recent content in Debugging on Besterry — Linux &amp; DevOps Notes</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 08 Nov 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://besterry.com/tags/debugging/index.xml" rel="self" type="application/rss+xml"/><item><title>tcpdump Filters Cheatsheet for When the Network is On Fire</title><link>https://besterry.com/posts/tcpdump-filters-cheatsheet/</link><pubDate>Fri, 08 Nov 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/tcpdump-filters-cheatsheet/</guid><description>&lt;p&gt;tcpdump has a weird little filter language (BPF syntax) that I never remember under pressure. This page is my cheatsheet.&lt;/p&gt;
&lt;h2 id="basic-syntax"&gt;Basic syntax&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;tcpdump -i &amp;lt;interface&amp;gt; -n &amp;lt;filter&amp;gt;
-n don't resolve addresses/ports
-i interface (eth0, any, lo)
-v verbose (-vv, -vvv more)
-w write to file for later wireshark
-r read from file
-c N stop after N packets
-s 0 capture full packet (not truncated)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="host-and-network-filters"&gt;Host and network filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;host 192.0.2.1 # to or from
src host 192.0.2.1 # from only
dst host 192.0.2.1 # to only
net 192.0.2.0/24 # subnet
src net 192.0.2.0/24 # subnet as source
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="port-filters"&gt;Port filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;port 443 # source or dest port 443
src port 443 # source only
dst port 443 # dest only
portrange 50000-60000 # range
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="protocol-filters"&gt;Protocol filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;tcp # TCP only
udp # UDP only
icmp # ICMP only
arp # ARP
tcp port 443 # combine
'tcp[tcpflags] &amp;amp; tcp-syn != 0' # TCP with SYN flag
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="tcp-flag-combinations"&gt;TCP flag combinations&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;# SYN only (connection attempts)
'tcp[tcpflags] == tcp-syn'
# SYN-ACK
'tcp[tcpflags] == tcp-syn|tcp-ack'
# RST (connection resets)
'tcp[tcpflags] &amp;amp; tcp-rst != 0'
# FIN (connection closes)
'tcp[tcpflags] &amp;amp; tcp-fin != 0'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="combining-filters"&gt;Combining filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;host 192.0.2.1 and tcp port 443
'host 192.0.2.1 and (port 80 or port 443)'
'not arp and not port 22'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Boolean operators: &lt;code&gt;and&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;, &lt;code&gt;not&lt;/code&gt; (or &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;, &lt;code&gt;||&lt;/code&gt;, &lt;code&gt;!&lt;/code&gt;).&lt;/p&gt;</description></item><item><title>Kubernetes Troubleshooting: The First 10 Minutes of an Outage</title><link>https://besterry.com/posts/k8s-troubleshooting/</link><pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/k8s-troubleshooting/</guid><description>&lt;p&gt;When PagerDuty wakes you up about a Kubernetes cluster issue, the first 10 minutes matter. Here is the runbook I work through before anything else.&lt;/p&gt;
&lt;h2 id="get-your-bearings"&gt;Get your bearings&lt;/h2&gt;
&lt;p&gt;First, confirm what&amp;rsquo;s actually broken from the user side. Check the status page or synthetic monitor. Many &amp;ldquo;outages&amp;rdquo; are monitoring issues, not real problems.&lt;/p&gt;
&lt;h2 id="cluster-level-check"&gt;Cluster-level check&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;kubectl get nodes
kubectl top nodes
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Look for NotReady nodes and resource pressure. If multiple nodes are down, the problem is probably infrastructure — check the cloud provider console.&lt;/p&gt;</description></item><item><title>Useful bpftrace One-Liners for System Debugging</title><link>https://besterry.com/posts/bpftrace-oneliners/</link><pubDate>Thu, 02 May 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/bpftrace-oneliners/</guid><description>&lt;p&gt;bpftrace makes the kernel event space accessible from a bash one-liner. Here are the scripts I keep reaching for.&lt;/p&gt;
&lt;h2 id="count-syscalls-by-process"&gt;Count syscalls by process&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="distribution-of-file-read-sizes"&gt;Distribution of file read sizes&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:syscalls:sys_enter_read { @ = hist(args-&amp;gt;count); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="tcp-retransmissions-by-remote-address"&gt;TCP retransmissions by remote address&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e '
kprobe:tcp_retransmit_skb {
$sk = (struct sock *)arg0;
$daddr = $sk-&amp;gt;__sk_common.skc_daddr;
@[ntop($daddr)] = count();
}'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="process-creation-stream"&gt;Process creation stream&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:sched:sched_process_exec { printf(&amp;quot;%s\n&amp;quot;, str(args-&amp;gt;filename)); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="when-to-use-bpftrace-vs-perf-vs-strace"&gt;When to use bpftrace vs perf vs strace&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;strace: simple, but adds significant overhead. Fine for debugging a single misbehaving process.&lt;/li&gt;
&lt;li&gt;perf: best for sampling-based profiling (CPU time, cache misses). Low overhead.&lt;/li&gt;
&lt;li&gt;bpftrace: best for event-driven tracing across the whole system. Tiny overhead if used sparingly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three should be in your toolbox.&lt;/p&gt;</description></item><item><title>Docker Network Debugging: nsenter and tcpdump Patterns</title><link>https://besterry.com/posts/docker-networking/</link><pubDate>Wed, 20 Mar 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/docker-networking/</guid><description>&lt;p&gt;When a container cannot reach something, the instinct is often to exec into it and curl. But most slim containers lack curl, dig, tcpdump, or even ping. A better pattern: use nsenter from the host.&lt;/p&gt;
&lt;h2 id="enter-the-container-network-namespace"&gt;Enter the container network namespace&lt;/h2&gt;
&lt;p&gt;Get the container PID:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker inspect -f '{{.State.Pid}}' myapp
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo nsenter -t PID -n bash
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You are now in the container network namespace, but with the host binaries. tcpdump, ip, ss, dig — all work.&lt;/p&gt;</description></item></channel></rss>