Kubernetes Troubleshooting: The First 10 Minutes of an Outage

When PagerDuty wakes you up about a Kubernetes cluster issue, the first 10 minutes matter. Here is the runbook I work through before anything else. Get your bearings First, confirm what’s actually broken from the user side. Check the status page or synthetic monitor. Many “outages” are monitoring issues, not real problems. Cluster-level check kubectl get nodes kubectl top nodes Look for NotReady nodes and resource pressure. If multiple nodes are down, the problem is probably infrastructure — check the cloud provider console. ...

July 22, 2024 · 2 min · Besterry

Reducing Container Image Size: Multi-Stage Builds and Alpine

Small images boot faster, save bandwidth, and have smaller attack surface. Here are the techniques that actually work. Multi-stage builds The single biggest win. Build in one stage, copy only the artifacts to a minimal runtime stage. A Go binary of 15 MB ends up in a 17 MB image. Compare to a naive golang:1.22 image at 900+ MB. Base image choice From smallest to largest for Go/Rust static binaries: ...

May 20, 2024 · 1 min · Besterry

Docker Network Debugging: nsenter and tcpdump Patterns

When a container cannot reach something, the instinct is often to exec into it and curl. But most slim containers lack curl, dig, tcpdump, or even ping. A better pattern: use nsenter from the host. Enter the container network namespace Get the container PID: docker inspect -f '{{.State.Pid}}' myapp Then: sudo nsenter -t PID -n bash You are now in the container network namespace, but with the host binaries. tcpdump, ip, ss, dig — all work. ...

March 20, 2024 · 2 min · Besterry