Cluster Outage from k3d Node Restart

I ran docker restart k3d-homelab-server-0 and my SSH session froze. Then it disconnected. Then I realized the SSH tunnel runs inside the cluster I just restarted. That was the beginning of a 75-minute full outage that taught me more about my own infrastructure than the previous six months of it working fine. Date 2026-03-08 Duration ~75 minutes Severity Full outage — all services down, no remote access Trigger docker restart k3d-homelab-server-0 to pick up containerd registry config Each part covers a different failure mode and the debugging methodology behind it. The specific technologies will change; the patterns won’t. ...

March 8, 2026 · 26 min · Amaury Yacksmith

Transmission Down 11+ Hours — PodSecurity Time Bomb

Eleven hours after the cluster rebuild, I tried to use Transmission and it wasn’t there. Not erroring. Not crash-looping. Just… not there. The pod had never been created, and I hadn’t noticed because nothing screamed at me about it. Two chained time bombs, both invisible until the cluster rebuild forced pod recreation. The first one was a PodSecurity policy that silently blocked Gluetun’s NET_ADMIN capability. The second — stale IPv6 IP rules from the previous pod’s unclean shutdown — was hiding behind it. ...

March 8, 2026 · 14 min · Amaury Yacksmith