<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Posts on Besterry — Linux &amp; DevOps Notes</title><link>https://besterry.com/posts/</link><description>Recent content in Posts on Besterry — Linux &amp; DevOps Notes</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 22 Dec 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://besterry.com/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Incident Response Playbook That Actually Gets Used</title><link>https://besterry.com/posts/incident-response-playbook/</link><pubDate>Sun, 22 Dec 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/incident-response-playbook/</guid><description>&lt;p&gt;Most incident playbooks end up as wiki pages nobody reads during an actual incident. Here&amp;rsquo;s what survives contact with a real 3am pager.&lt;/p&gt;
&lt;h2 id="the-first-five-minutes"&gt;The first five minutes&lt;/h2&gt;
&lt;p&gt;One person is the Incident Commander (IC). If that&amp;rsquo;s not clear, declare yourself IC.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Acknowledge the page&lt;/li&gt;
&lt;li&gt;Post in #incidents: &amp;ldquo;Incident: [brief]. I am IC.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Start a timeline document (even just a text file)&lt;/li&gt;
&lt;li&gt;Check public status page — update if user-visible&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Don&amp;rsquo;t dig into the problem yet. Set up the command structure first.&lt;/p&gt;</description></item><item><title>The Observability Pyramid: Logs, Metrics, Traces in 2026</title><link>https://besterry.com/posts/observability-pyramid/</link><pubDate>Tue, 10 Dec 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/observability-pyramid/</guid><description>&lt;p&gt;The three pillars of observability are talked about a lot. Which one to reach for depends on the question you&amp;rsquo;re answering.&lt;/p&gt;
&lt;h2 id="metrics-for-is-it-broken-and-how-much"&gt;Metrics: for &amp;ldquo;is it broken and how much&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Aggregated numerical data over time. Good for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dashboards and alerts&lt;/li&gt;
&lt;li&gt;Trends (is latency increasing week-over-week?)&lt;/li&gt;
&lt;li&gt;Capacity planning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not good for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explaining why a specific request was slow&lt;/li&gt;
&lt;li&gt;Finding causality between events&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stack: Prometheus + Grafana remains the default. OpenTelemetry Metrics if you want vendor-neutral instrumentation.&lt;/p&gt;</description></item><item><title>Rust vs Go for CLI Tools: A Practical Comparison</title><link>https://besterry.com/posts/rust-vs-go-for-cli-tools/</link><pubDate>Mon, 25 Nov 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/rust-vs-go-for-cli-tools/</guid><description>&lt;p&gt;After writing CLI tools in both Rust and Go over the last few years, here are the things that actually matter when choosing between them.&lt;/p&gt;
&lt;h2 id="startup-time"&gt;Startup time&lt;/h2&gt;
&lt;p&gt;Go wins. A trivial Go program starts in ~1-5ms. A trivial Rust program also starts in ~1-5ms. Both are negligible for CLI tools. (The old argument about Go&amp;rsquo;s startup was mostly about JVM-vs-Go, not Go-vs-Rust.)&lt;/p&gt;
&lt;h2 id="binary-size"&gt;Binary size&lt;/h2&gt;
&lt;p&gt;Out of the box:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go: 5-15 MB for a small program&lt;/li&gt;
&lt;li&gt;Rust: 2-8 MB for a small program (with LTO and strip)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After aggressive optimization:&lt;/p&gt;</description></item><item><title>tcpdump Filters Cheatsheet for When the Network is On Fire</title><link>https://besterry.com/posts/tcpdump-filters-cheatsheet/</link><pubDate>Fri, 08 Nov 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/tcpdump-filters-cheatsheet/</guid><description>&lt;p&gt;tcpdump has a weird little filter language (BPF syntax) that I never remember under pressure. This page is my cheatsheet.&lt;/p&gt;
&lt;h2 id="basic-syntax"&gt;Basic syntax&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;tcpdump -i &amp;lt;interface&amp;gt; -n &amp;lt;filter&amp;gt;
-n don't resolve addresses/ports
-i interface (eth0, any, lo)
-v verbose (-vv, -vvv more)
-w write to file for later wireshark
-r read from file
-c N stop after N packets
-s 0 capture full packet (not truncated)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="host-and-network-filters"&gt;Host and network filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;host 192.0.2.1 # to or from
src host 192.0.2.1 # from only
dst host 192.0.2.1 # to only
net 192.0.2.0/24 # subnet
src net 192.0.2.0/24 # subnet as source
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="port-filters"&gt;Port filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;port 443 # source or dest port 443
src port 443 # source only
dst port 443 # dest only
portrange 50000-60000 # range
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="protocol-filters"&gt;Protocol filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;tcp # TCP only
udp # UDP only
icmp # ICMP only
arp # ARP
tcp port 443 # combine
'tcp[tcpflags] &amp;amp; tcp-syn != 0' # TCP with SYN flag
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="tcp-flag-combinations"&gt;TCP flag combinations&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;# SYN only (connection attempts)
'tcp[tcpflags] == tcp-syn'
# SYN-ACK
'tcp[tcpflags] == tcp-syn|tcp-ack'
# RST (connection resets)
'tcp[tcpflags] &amp;amp; tcp-rst != 0'
# FIN (connection closes)
'tcp[tcpflags] &amp;amp; tcp-fin != 0'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="combining-filters"&gt;Combining filters&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;host 192.0.2.1 and tcp port 443
'host 192.0.2.1 and (port 80 or port 443)'
'not arp and not port 22'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Boolean operators: &lt;code&gt;and&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;, &lt;code&gt;not&lt;/code&gt; (or &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;, &lt;code&gt;||&lt;/code&gt;, &lt;code&gt;!&lt;/code&gt;).&lt;/p&gt;</description></item><item><title>Self-Host vs SaaS: The Actual Tradeoffs</title><link>https://besterry.com/posts/selfhost-vs-saas-tradeoffs/</link><pubDate>Tue, 22 Oct 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/selfhost-vs-saas-tradeoffs/</guid><description>&lt;p&gt;The &amp;ldquo;self-host everything&amp;rdquo; movement has passionate advocates on both sides. Reality is nuanced. Here&amp;rsquo;s the framework I use when deciding.&lt;/p&gt;
&lt;h2 id="cost-isnt-the-main-factor"&gt;Cost isn&amp;rsquo;t the main factor&lt;/h2&gt;
&lt;p&gt;Many self-host advocates lead with cost savings. Usually it&amp;rsquo;s misleading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SaaS at small scale is often free or cheap ($0-50/mo)&lt;/li&gt;
&lt;li&gt;Self-hosting on cheap VPS starts around $5/mo&lt;/li&gt;
&lt;li&gt;But self-hosting eats engineer time — 2-10 hours/month for maintenance&lt;/li&gt;
&lt;li&gt;At $100/hr engineering time, self-hosting often costs MORE than SaaS&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cost-wise, self-hosting wins when you&amp;rsquo;re either:&lt;/p&gt;</description></item><item><title>Grafana Dashboards That Don't Suck: Principles and Anti-Patterns</title><link>https://besterry.com/posts/grafana-dashboards-dont-suck/</link><pubDate>Sat, 05 Oct 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/grafana-dashboards-dont-suck/</guid><description>&lt;p&gt;Most Grafana dashboards are bad. Too many panels, unclear queries, inconsistent color schemes, no clear purpose. Here are the principles I apply now.&lt;/p&gt;
&lt;h2 id="rule-1-every-dashboard-has-one-question"&gt;Rule 1: Every dashboard has one question&lt;/h2&gt;
&lt;p&gt;Start by writing down: &amp;ldquo;What question does this dashboard answer?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Good:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Is the order service healthy right now?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;How is the nightly ETL job progressing?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;What is the cost trend for our compute in the last 30 days?&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bad:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Production metrics&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Database overview&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you can&amp;rsquo;t state the question in one sentence, you don&amp;rsquo;t know what the dashboard is for.&lt;/p&gt;</description></item><item><title>Terraform State Locking: Why You Need It and How It Goes Wrong</title><link>https://besterry.com/posts/terraform-state-locking/</link><pubDate>Wed, 18 Sep 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/terraform-state-locking/</guid><description>&lt;p&gt;Terraform state without locking is a bug waiting to happen. Two engineers running apply simultaneously can corrupt state in ways that take hours to untangle. Here&amp;rsquo;s what I learned after one such incident.&lt;/p&gt;
&lt;h2 id="why-state-locking-matters"&gt;Why state locking matters&lt;/h2&gt;
&lt;p&gt;Terraform reads state, computes a plan, and writes new state. Without locking, two concurrent runs can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Both read the same initial state&lt;/li&gt;
&lt;li&gt;Both compute their plans based on it&lt;/li&gt;
&lt;li&gt;Both write conflicting state — last one wins&lt;/li&gt;
&lt;li&gt;Now state doesn&amp;rsquo;t match real infrastructure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The symptoms are weird: resources exist but Terraform wants to create them again. Or state references resources that were already destroyed.&lt;/p&gt;</description></item><item><title>ZFS on Linux: Six Months of Production Use</title><link>https://besterry.com/posts/zfs-on-linux/</link><pubDate>Mon, 02 Sep 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/zfs-on-linux/</guid><description>&lt;p&gt;Migrated our build server array from ext4+mdadm to ZFS on Linux six months ago. Here&amp;rsquo;s what I learned.&lt;/p&gt;
&lt;h2 id="why-zfs"&gt;Why ZFS&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Checksumming catches silent data corruption (we found 14 affected files on the old array)&lt;/li&gt;
&lt;li&gt;Snapshots are cheap and instant (100ms for a 10TB dataset)&lt;/li&gt;
&lt;li&gt;Compression often makes things faster — less I/O, more CPU&lt;/li&gt;
&lt;li&gt;Send/receive for efficient replication&lt;/li&gt;
&lt;li&gt;No separate mdadm/LVM layer to debug&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="pool-design"&gt;Pool design&lt;/h2&gt;
&lt;p&gt;For the build server, 6 x 4TB NVMe in RAIDZ2:&lt;/p&gt;</description></item><item><title>PostgreSQL Backup Strategies: Not All Backups Are Equal</title><link>https://besterry.com/posts/postgres-backup-strategies/</link><pubDate>Sun, 18 Aug 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/postgres-backup-strategies/</guid><description>&lt;p&gt;A backup you can&amp;rsquo;t restore isn&amp;rsquo;t a backup. After losing data once (fortunately from a test environment), here&amp;rsquo;s the framework I apply now.&lt;/p&gt;
&lt;h2 id="the-three-levels-of-recovery"&gt;The three levels of recovery&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Point-in-time recovery (PITR)&lt;/strong&gt;: Restore to any second in the last N days. Requires WAL archiving + base backups.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Daily snapshots&lt;/strong&gt;: Restore to yesterday&amp;rsquo;s 3am state. Simple, cheap, 24h RPO.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Logical dumps&lt;/strong&gt;: Restore specific tables or data subsets. Useful for selective recovery.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Most production databases should have all three.&lt;/p&gt;</description></item><item><title>Modern TLS Cipher Configuration in 2026</title><link>https://besterry.com/posts/tls-modern-ciphers/</link><pubDate>Mon, 05 Aug 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/tls-modern-ciphers/</guid><description>&lt;p&gt;Configuring TLS ciphers used to involve copying a magic list from Mozilla SSL Configurator and moving on. In 2026 the landscape has shifted enough that revisiting is worth it.&lt;/p&gt;
&lt;h2 id="what-changed"&gt;What changed&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;TLS 1.3 is now supported by 95%+ of clients. Serving TLS 1.0 or 1.1 is an active liability.&lt;/li&gt;
&lt;li&gt;OpenSSL 3.x became the default on most modern distros. Some older ciphers are simply gone.&lt;/li&gt;
&lt;li&gt;Post-quantum hybrid key exchange (X25519-Kyber768) started rolling out in Chrome and Firefox.&lt;/li&gt;
&lt;li&gt;Perfect Forward Secrecy is universally expected. No more RSA key exchange.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="recommended-nginx-config"&gt;Recommended nginx config&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;ssl_protocols TLSv1.2 TLSv1.3;
# TLS 1.3 cipher suites (nginx picks automatically)
ssl_ciphers 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers off;
ssl_ecdh_curve X25519:secp521r1:secp384r1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;ssl_prefer_server_ciphers off&lt;/code&gt; is correct for modern deployments — clients know better than servers which ciphers perform well on their hardware.&lt;/p&gt;</description></item><item><title>Kubernetes Troubleshooting: The First 10 Minutes of an Outage</title><link>https://besterry.com/posts/k8s-troubleshooting/</link><pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/k8s-troubleshooting/</guid><description>&lt;p&gt;When PagerDuty wakes you up about a Kubernetes cluster issue, the first 10 minutes matter. Here is the runbook I work through before anything else.&lt;/p&gt;
&lt;h2 id="get-your-bearings"&gt;Get your bearings&lt;/h2&gt;
&lt;p&gt;First, confirm what&amp;rsquo;s actually broken from the user side. Check the status page or synthetic monitor. Many &amp;ldquo;outages&amp;rdquo; are monitoring issues, not real problems.&lt;/p&gt;
&lt;h2 id="cluster-level-check"&gt;Cluster-level check&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;kubectl get nodes
kubectl top nodes
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Look for NotReady nodes and resource pressure. If multiple nodes are down, the problem is probably infrastructure — check the cloud provider console.&lt;/p&gt;</description></item><item><title>Alert Fatigue: Prometheus Rules That Actually Help</title><link>https://besterry.com/posts/prometheus-alerts/</link><pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/prometheus-alerts/</guid><description>&lt;p&gt;Most alerts are noise. The hardest part of monitoring is deciding what NOT to alert on. Here is the framework I use.&lt;/p&gt;
&lt;h2 id="rule-1-every-alert-must-be-actionable"&gt;Rule 1: Every alert must be actionable&lt;/h2&gt;
&lt;p&gt;If you get paged and there is nothing to do, the alert should not exist. Either fix the root cause, automate the response, or let it be a metric trend instead of a page.&lt;/p&gt;
&lt;h2 id="rule-2-alert-on-user-visible-symptoms"&gt;Rule 2: Alert on user-visible symptoms&lt;/h2&gt;
&lt;p&gt;Instead of HighCPUUsage, prefer HighRequestLatency. CPU usage high with good latency means the system is working as designed. Latency high means users are hurting.&lt;/p&gt;</description></item><item><title>Reducing Container Image Size: Multi-Stage Builds and Alpine</title><link>https://besterry.com/posts/container-image-size/</link><pubDate>Mon, 20 May 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/container-image-size/</guid><description>&lt;p&gt;Small images boot faster, save bandwidth, and have smaller attack surface. Here are the techniques that actually work.&lt;/p&gt;
&lt;h2 id="multi-stage-builds"&gt;Multi-stage builds&lt;/h2&gt;
&lt;p&gt;The single biggest win. Build in one stage, copy only the artifacts to a minimal runtime stage. A Go binary of 15 MB ends up in a 17 MB image. Compare to a naive golang:1.22 image at 900+ MB.&lt;/p&gt;
&lt;h2 id="base-image-choice"&gt;Base image choice&lt;/h2&gt;
&lt;p&gt;From smallest to largest for Go/Rust static binaries:&lt;/p&gt;</description></item><item><title>Useful bpftrace One-Liners for System Debugging</title><link>https://besterry.com/posts/bpftrace-oneliners/</link><pubDate>Thu, 02 May 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/bpftrace-oneliners/</guid><description>&lt;p&gt;bpftrace makes the kernel event space accessible from a bash one-liner. Here are the scripts I keep reaching for.&lt;/p&gt;
&lt;h2 id="count-syscalls-by-process"&gt;Count syscalls by process&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="distribution-of-file-read-sizes"&gt;Distribution of file read sizes&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:syscalls:sys_enter_read { @ = hist(args-&amp;gt;count); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="tcp-retransmissions-by-remote-address"&gt;TCP retransmissions by remote address&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e '
kprobe:tcp_retransmit_skb {
$sk = (struct sock *)arg0;
$daddr = $sk-&amp;gt;__sk_common.skc_daddr;
@[ntop($daddr)] = count();
}'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="process-creation-stream"&gt;Process creation stream&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;bpftrace -e 'tracepoint:sched:sched_process_exec { printf(&amp;quot;%s\n&amp;quot;, str(args-&amp;gt;filename)); }'
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="when-to-use-bpftrace-vs-perf-vs-strace"&gt;When to use bpftrace vs perf vs strace&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;strace: simple, but adds significant overhead. Fine for debugging a single misbehaving process.&lt;/li&gt;
&lt;li&gt;perf: best for sampling-based profiling (CPU time, cache misses). Low overhead.&lt;/li&gt;
&lt;li&gt;bpftrace: best for event-driven tracing across the whole system. Tiny overhead if used sparingly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three should be in your toolbox.&lt;/p&gt;</description></item><item><title>WireGuard vs AmneziaWG: When Obfuscation Matters</title><link>https://besterry.com/posts/wireguard-vs-amneziawg/</link><pubDate>Mon, 15 Apr 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/wireguard-vs-amneziawg/</guid><description>&lt;p&gt;Plain WireGuard is simple and fast. AmneziaWG adds obfuscation to the handshake. When do you need which?&lt;/p&gt;
&lt;h2 id="plain-wireguard-is-enough-when"&gt;Plain WireGuard is enough when&lt;/h2&gt;
&lt;p&gt;You control both endpoints, no DPI is filtering your traffic, and the main concern is performance and simplicity. WireGuard shines for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Site-to-site VPN between your own servers&lt;/li&gt;
&lt;li&gt;Remote access to a home lab&lt;/li&gt;
&lt;li&gt;Point-to-point tunnels on a LAN&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The handshake is small, fast, and provably secure. It uses Noise framework primitives and 1 RTT.&lt;/p&gt;</description></item><item><title>SSH Hardening Checklist for Public VPS</title><link>https://besterry.com/posts/ssh-hardening/</link><pubDate>Mon, 01 Apr 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/ssh-hardening/</guid><description>&lt;p&gt;Every public-facing server gets port-scanned within minutes of going online. Default SSH settings are decent but not great. Here is the checklist I run through on every new VPS.&lt;/p&gt;
&lt;h2 id="disable-password-authentication"&gt;Disable password authentication&lt;/h2&gt;
&lt;p&gt;In /etc/ssh/sshd_config:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PasswordAuthentication no
PubkeyAuthentication yes
ChallengeResponseAuthentication no
KbdInteractiveAuthentication no
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="restrict-root-login"&gt;Restrict root login&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PermitRootLogin prohibit-password
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This allows root login with key but not password, which is fine for automation. For stricter setups, use no and sudo from an unprivileged user.&lt;/p&gt;</description></item><item><title>Docker Network Debugging: nsenter and tcpdump Patterns</title><link>https://besterry.com/posts/docker-networking/</link><pubDate>Wed, 20 Mar 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/docker-networking/</guid><description>&lt;p&gt;When a container cannot reach something, the instinct is often to exec into it and curl. But most slim containers lack curl, dig, tcpdump, or even ping. A better pattern: use nsenter from the host.&lt;/p&gt;
&lt;h2 id="enter-the-container-network-namespace"&gt;Enter the container network namespace&lt;/h2&gt;
&lt;p&gt;Get the container PID:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker inspect -f '{{.State.Pid}}' myapp
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo nsenter -t PID -n bash
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You are now in the container network namespace, but with the host binaries. tcpdump, ip, ss, dig — all work.&lt;/p&gt;</description></item><item><title>nginx Performance Tuning: Practical Notes from Production</title><link>https://besterry.com/posts/nginx-performance-tuning/</link><pubDate>Tue, 05 Mar 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/nginx-performance-tuning/</guid><description>&lt;p&gt;After running nginx on everything from 512 MB VPS instances to multi-socket bare metal, here are the settings I&amp;rsquo;ve found actually matter.&lt;/p&gt;
&lt;h2 id="worker_processes-and-worker_connections"&gt;worker_processes and worker_connections&lt;/h2&gt;
&lt;p&gt;Start with &lt;code&gt;worker_processes auto;&lt;/code&gt;.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="keepalive-tuning"&gt;Keepalive tuning&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;http {
keepalive_timeout 30s;
keepalive_requests 1000;
upstream backend {
server 10.0.0.1:8080;
keepalive 32;
}
}
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="buffer-sizes"&gt;Buffer sizes&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;client_body_buffer_size 128k;
client_max_body_size 50m;
proxy_buffer_size 8k;
proxy_buffers 8 8k;
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="gzip-and-brotli"&gt;gzip and brotli&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;gzip on;
gzip_comp_level 5;
gzip_types text/plain text/css application/json;
brotli on;
brotli_comp_level 4;
brotli_types text/plain text/css application/json;
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="measurement"&gt;Measurement&lt;/h2&gt;
&lt;p&gt;None of this matters if you don&amp;rsquo;t measure. Install &lt;code&gt;nginx-module-vts&lt;/code&gt; or expose &lt;code&gt;stub_status&lt;/code&gt;, feed metrics to Prometheus, and compare before/after for any changes.&lt;/p&gt;</description></item><item><title>systemd Timers vs Cron: When to Use Which</title><link>https://besterry.com/posts/systemd-timer-vs-cron/</link><pubDate>Sat, 17 Feb 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/systemd-timer-vs-cron/</guid><description>&lt;p&gt;Cron has been the standard scheduler on Unix for decades. systemd timers are newer, more powerful, but also more verbose.&lt;/p&gt;
&lt;h2 id="cron-wins-when"&gt;Cron wins when&lt;/h2&gt;
&lt;p&gt;Cron is perfect for one-line scripts that need to run on a simple schedule. Writing:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;0 3 * * * /usr/local/bin/backup.sh
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;is fast, requires no other files, and works on every Unix-like system since the 1970s.&lt;/p&gt;
&lt;h2 id="systemd-timers-win-when"&gt;systemd timers win when&lt;/h2&gt;
&lt;p&gt;You want any of these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Logging integrated with journalctl&lt;/li&gt;
&lt;li&gt;Dependencies on other units (&lt;code&gt;After=network-online.target&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Resource limits (&lt;code&gt;MemoryMax=&lt;/code&gt;, &lt;code&gt;CPUQuota=&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Randomized delays to avoid thundering herd (&lt;code&gt;RandomizedDelaySec=&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The ability to manually trigger with &lt;code&gt;systemctl start&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Catch-up behavior after system was off (&lt;code&gt;Persistent=true&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="minimal-systemd-timer-example"&gt;Minimal systemd timer example&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;/etc/systemd/system/backup.service&lt;/code&gt;:&lt;/p&gt;</description></item><item><title>Linux Networking Deep Dive: From Socket to Wire</title><link>https://besterry.com/posts/linux-networking-deep-dive/</link><pubDate>Sat, 10 Feb 2024 00:00:00 +0000</pubDate><guid>https://besterry.com/posts/linux-networking-deep-dive/</guid><description>&lt;p&gt;Every time a packet leaves your Linux machine, it travels through a surprisingly long sequence of stages. Understanding this path helps enormously when debugging network issues.&lt;/p&gt;
&lt;h2 id="the-socket-layer"&gt;The socket layer&lt;/h2&gt;
&lt;p&gt;When your application calls &lt;code&gt;send()&lt;/code&gt; or &lt;code&gt;write()&lt;/code&gt; on a socket, the kernel&amp;rsquo;s socket layer takes over. For a TCP socket this means handing the data to &lt;code&gt;tcp_sendmsg()&lt;/code&gt;, which in turn enqueues it into the socket&amp;rsquo;s send buffer.&lt;/p&gt;
&lt;p&gt;You can observe the send queue depth with &lt;code&gt;ss -tipm&lt;/code&gt;:&lt;/p&gt;</description></item></channel></rss>