Debugging on Besterry — Linux & DevOps Notes

tcpdump Filters Cheatsheet for When the Network is On Fire

Fri, 08 Nov 2024 00:00:00 +0000

tcpdump has a weird little filter language (BPF syntax) that I never remember under pressure. This page is my cheatsheet.

Basic syntax

tcpdump -i <interface> -n <filter>
-n don't resolve addresses/ports
-i interface (eth0, any, lo)
-v verbose (-vv, -vvv more)
-w write to file for later wireshark
-r read from file
-c N stop after N packets
-s 0 capture full packet (not truncated)

Host and network filters

host 192.0.2.1 # to or from
src host 192.0.2.1 # from only
dst host 192.0.2.1 # to only
net 192.0.2.0/24 # subnet
src net 192.0.2.0/24 # subnet as source

Port filters

port 443 # source or dest port 443
src port 443 # source only
dst port 443 # dest only
portrange 50000-60000 # range

Protocol filters

tcp # TCP only
udp # UDP only
icmp # ICMP only
arp # ARP
tcp port 443 # combine
'tcp[tcpflags] & tcp-syn != 0' # TCP with SYN flag

TCP flag combinations

# SYN only (connection attempts)
'tcp[tcpflags] == tcp-syn'
# SYN-ACK
'tcp[tcpflags] == tcp-syn|tcp-ack'
# RST (connection resets)
'tcp[tcpflags] & tcp-rst != 0'
# FIN (connection closes)
'tcp[tcpflags] & tcp-fin != 0'

Combining filters

host 192.0.2.1 and tcp port 443
'host 192.0.2.1 and (port 80 or port 443)'
'not arp and not port 22'

Boolean operators: and, or, not (or &&, ||, !).

Kubernetes Troubleshooting: The First 10 Minutes of an Outage

Mon, 22 Jul 2024 00:00:00 +0000

When PagerDuty wakes you up about a Kubernetes cluster issue, the first 10 minutes matter. Here is the runbook I work through before anything else.

Get your bearings

First, confirm what’s actually broken from the user side. Check the status page or synthetic monitor. Many “outages” are monitoring issues, not real problems.

Cluster-level check

kubectl get nodes
kubectl top nodes

Look for NotReady nodes and resource pressure. If multiple nodes are down, the problem is probably infrastructure — check the cloud provider console.

Useful bpftrace One-Liners for System Debugging

Thu, 02 May 2024 00:00:00 +0000

bpftrace makes the kernel event space accessible from a bash one-liner. Here are the scripts I keep reaching for.

Count syscalls by process

bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

Distribution of file read sizes

bpftrace -e 'tracepoint:syscalls:sys_enter_read { @ = hist(args->count); }'

TCP retransmissions by remote address

bpftrace -e '
kprobe:tcp_retransmit_skb {
$sk = (struct sock *)arg0;
$daddr = $sk->__sk_common.skc_daddr;
@[ntop($daddr)] = count();
}'

Process creation stream

bpftrace -e 'tracepoint:sched:sched_process_exec { printf("%s\n", str(args->filename)); }'

When to use bpftrace vs perf vs strace

strace: simple, but adds significant overhead. Fine for debugging a single misbehaving process.
perf: best for sampling-based profiling (CPU time, cache misses). Low overhead.
bpftrace: best for event-driven tracing across the whole system. Tiny overhead if used sparingly.

All three should be in your toolbox.

Docker Network Debugging: nsenter and tcpdump Patterns

Wed, 20 Mar 2024 00:00:00 +0000

When a container cannot reach something, the instinct is often to exec into it and curl. But most slim containers lack curl, dig, tcpdump, or even ping. A better pattern: use nsenter from the host.

Enter the container network namespace

Get the container PID:

docker inspect -f '{{.State.Pid}}' myapp

Then:

sudo nsenter -t PID -n bash

You are now in the container network namespace, but with the host binaries. tcpdump, ip, ss, dig — all work.