Walking Through an Unplanned Failover: SQL Server Availability Groups on Kubernetes
In my planned failover walkthrough, I showed what happens when you deliberately move the primary role to another replica. That’s the easy case. Now I want to show what happens when the primary pod just disappears unexpectedly, like during a node failure or a container crash. No graceful shutdown, no demotion, just gone.
I ran two test scenarios, each cycling the primary role across all three pods by force-deleting the current primary three times in a row. First, a 5GB TPC-C database idle. Then, that same 5GB database under sustained HammerDB TPC-C load. Six force-deletes total, six successful automatic failovers. I’ll walk through the error log from the promoted replica, the operator’s detection and recovery behavior, and the full timing data.