Walking Through a Planned Failover: SQL Server Availability Groups on Kubernetes
When building the sql-on-k8s-operator, I wanted to make sure it could handle both planned and unplanned failovers. The easy case is a planned failover, where you deliberately move the primary role to another replica. The harder case is an unplanned failover, where the primary pod just disappears. The operator needs to handle both.
I recently ran a full planned failover rotation on a three-replica SQL Server Availability Group managed by sql-on-k8s-operator, and I want to show you exactly what happens inside SQL Server and the operator during each hop. If you’ve been following my Introducing the SQL Server on Kubernetes Operator post, this is the logical next step: what does the error log actually look like during a planned failover, what does the operator do in response, and how long does the whole thing take?