SQL Server

I’m honored to announce that I’ve been renewed as a Microsoft MVP for the tenth consecutive year, recognized in the Azure SQL and SQL Server technical areas under Data Platform. Ten years. I honestly didn’t see that coming when I set a five-year goal back in 2016.

Ten Years

I want to stop and acknowledge that milestone for a moment. When I first earned this award in 2017, I was deep in Availability Groups and SQL Server internals. Since then, the journey has taken me through SQL Server on Linux, containers, Kubernetes, storage integrations, and now AI-integrated SQL Server 2025. The platform has transformed dramatically, and I’ve had a front-row seat for all of it.

I’ve been thinking a lot lately about what it actually takes to make an AI agent genuinely useful for database work, both for administration and for application access to the data tier. Writing the T-SQL code is the easy part. A coding assistant can do that out of the box. The hard part is giving it visibility into a running SQL Server: which sessions are blocked right now, where the wait stats are pointing, which indexes the optimizer is begging for. Without that, the agent is just guessing. With application access, an agent can propose how things should work, but what if we had tools that added context describing the agent database’s schema and what the entities actually mean to the application when interacting with the database agentically?

If you’ve been following my T-SQL Snapshot Backup series, most of what I’ve covered requires SQL Server to participate in the snapshot: the write IO freeze, the metadata backup, the coordinated workflow. This post covers the other side of that coin: crash-consistent cloning. No write freeze. No backup. No point-in-time recovery. Just a raw volume clone that SQL Server recovers from automatically when you attach it.

What Is a Crash-Consistent Snapshot?

When people talk about snapshot backups for SQL Server, they often jump straight to application-consistent snapshots, the kind where SQL Server is asked to freeze write IO before the snapshot. That freeze guarantees every page on disk reflects a logically consistent database state.

If you’ve been following my T-SQL Snapshot Backup series, you’ve seen this technique work on bare-metal and standard VM deployments where database files live on volumes directly presented to the SQL Server OS. In this post, I’m bringing T-SQL Snapshot Backup into a Hyper-V cluster environment, with database files on VHDXs backed by a Pure Storage FlashArray Cluster Shared Volume (CSV). Hyper-V adds a few extra layers to manage at the hypervisor level, but the SQL Server side of the story is identical. Let’s walk through it.

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

In my planned failover walkthrough, I showed what happens when you deliberately move the primary role to another replica. That’s the easy case. Now I want to show what happens when the primary pod just disappears unexpectedly, like during a node failure or a container crash. No graceful shutdown, no demotion, just gone.

I ran two test scenarios, each cycling the primary role across all three pods by force-deleting the current primary three times in a row. First, a 5GB TPC-C database idle. Then, that same 5GB database under sustained HammerDB TPC-C load. Six force-deletes total, six successful automatic failovers. I’ll walk through the error log from the promoted replica, the operator’s detection and recovery behavior, and the full timing data.

When building the sql-on-k8s-operator, I wanted to make sure it could handle both planned and unplanned failovers. The easy case is a planned failover, where you deliberately move the primary role to another replica. The harder case is an unplanned failover, where the primary pod just disappears. The operator needs to handle both.

I recently ran a full planned failover rotation on a three-replica SQL Server Availability Group managed by sql-on-k8s-operator, and I want to show you exactly what happens inside SQL Server and the operator during each hop. If you’ve been following my Introducing the SQL Server on Kubernetes Operator post, this is the logical next step: what does the error log actually look like during a planned failover, what does the operator do in response, and how long does the whole thing take?

I’ve been doing a deep dive into SQL Server on-disk structures lately, and one of my favorite rabbit holes is revisiting Paul Randal’s series on file header pages. If you haven’t read it, go do that now. It covers what file header pages are, what they contain, and what happens when they corrupt. This post takes that concept and runs with it. I’ll use DBCC FILEHEADER to read the file header of every user database file on a server and answer a question that comes up more than you’d think: can you determine which files belong together as a database purely from the file header, without querying sys.databases?

Are you considering replatforming your SQL Server workload due to recent vendor changes, but still need high availability and disaster recovery? You’re not alone. One of the challenges with running SQL Server on Kubernetes is that there’s no Kubernetes operator available. That means no automated lifecycle management, no automatic failover, and no standard way to bootstrap an Always On Availability Group on Kubernetes.

I’m excited to share it today as an open-source project: sql-on-k8s-operator. Let’s go.

If you’re like me, you’ve probably been following Microsoft’s announcement about native NVMe support in Windows Server 2025 with great interest. While it’s limited to local drives, how about we break that rule and leverage our virtualization layer extend NVMe benefits throughout the entire storage stack, even to remote storage like a FlashArray? I decided to test that scenario, and the results are awesome. In this post, you will learn how to make your SQL Server workload about 25% faster without changing any code in your application. Let’s go.

SQL Server

Microsoft MVP 2026: Ten Years on the Data Platform

Ten Years

Giving AI Agents Visibility Into SQL Server with MCP

Crash-Consistent Snapshot Cloning - Hyper-V Edition

What Is a Crash-Consistent Snapshot?

Using T-SQL Snapshot Backup - Hyper-V Edition

Designing a Storage Load Test for SQL Server

Walking Through an Unplanned Failover: SQL Server Availability Groups on Kubernetes

Walking Through a Planned Failover: SQL Server Always On Availability Groups on Kubernetes

Reading SQL Server File Headers with DBCC FILEHEADER

Introducing the SQL Server on Kubernetes Operator

NVMe vs PVSCSI: Real-World Performance Testing for SQL Server Workloads on VMware