ZFS on Linux: Six Months of Production Use

Migrated our build server array from ext4+mdadm to ZFS on Linux six months ago. Here’s what I learned. Why ZFS Checksumming catches silent data corruption (we found 14 affected files on the old array) Snapshots are cheap and instant (100ms for a 10TB dataset) Compression often makes things faster — less I/O, more CPU Send/receive for efficient replication No separate mdadm/LVM layer to debug Pool design For the build server, 6 x 4TB NVMe in RAIDZ2: ...

September 2, 2024 · 2 min · Besterry

Useful bpftrace One-Liners for System Debugging

bpftrace makes the kernel event space accessible from a bash one-liner. Here are the scripts I keep reaching for. Count syscalls by process bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' Distribution of file read sizes bpftrace -e 'tracepoint:syscalls:sys_enter_read { @ = hist(args->count); }' TCP retransmissions by remote address bpftrace -e ' kprobe:tcp_retransmit_skb { $sk = (struct sock *)arg0; $daddr = $sk->__sk_common.skc_daddr; @[ntop($daddr)] = count(); }' Process creation stream bpftrace -e 'tracepoint:sched:sched_process_exec { printf("%s\n", str(args->filename)); }' When to use bpftrace vs perf vs strace strace: simple, but adds significant overhead. Fine for debugging a single misbehaving process. perf: best for sampling-based profiling (CPU time, cache misses). Low overhead. bpftrace: best for event-driven tracing across the whole system. Tiny overhead if used sparingly. All three should be in your toolbox.

May 2, 2024 · 1 min · Besterry

systemd Timers vs Cron: When to Use Which

Cron has been the standard scheduler on Unix for decades. systemd timers are newer, more powerful, but also more verbose. Cron wins when Cron is perfect for one-line scripts that need to run on a simple schedule. Writing: 0 3 * * * /usr/local/bin/backup.sh is fast, requires no other files, and works on every Unix-like system since the 1970s. systemd timers win when You want any of these: Logging integrated with journalctl Dependencies on other units (After=network-online.target) Resource limits (MemoryMax=, CPUQuota=) Randomized delays to avoid thundering herd (RandomizedDelaySec=) The ability to manually trigger with systemctl start Catch-up behavior after system was off (Persistent=true) Minimal systemd timer example /etc/systemd/system/backup.service: ...

February 17, 2024 · 1 min · Besterry

Linux Networking Deep Dive: From Socket to Wire

Every time a packet leaves your Linux machine, it travels through a surprisingly long sequence of stages. Understanding this path helps enormously when debugging network issues. The socket layer When your application calls send() or write() on a socket, the kernel’s socket layer takes over. For a TCP socket this means handing the data to tcp_sendmsg(), which in turn enqueues it into the socket’s send buffer. You can observe the send queue depth with ss -tipm: ...

February 10, 2024 · 2 min · Besterry