NVMe vs PVSCSI: Real-World Performance Testing for SQL Server Workloads on VMware

If you’re like me, you’ve probably been following Microsoft’s announcement about native NVMe support in Windows Server 2025 with great interest. While it’s limited to local drives, how about we break that rule and leverage our virtualization layer extend NVMe benefits throughout the entire storage stack, even to remote storage like a FlashArray? I decided to test that scenario, and the results are awesome. In this post, you will learn how to make your SQL Server workload about 25% faster without changing any code in your application. Let’s go.

The Experiment: Can We Get End-to-End NVMe Benefits?

Here’s what I wanted to prove: could we leverage NVMe from the guest OS through VMware to a Pure Storage FlashArray using an NVMe/TCP attached Datastore? To isolate the impact of what Windows 2025 is doing with NVMe versus SCSI, I set up an this test:

  1. NVMe Path: Provisioned a virtual disk from an NVMe/TCP attached datastore and attached it to the VM’s NVMe controller.
  2. SCSI Path: Provisioned a virtual disk from the same NVMe datastore but attached it to the VM’s PVSCSI controller.

Same underlying storage plumbing, same FlashArray, same NVMe/TCP transport, the only difference is the virtual disk controller presented to Windows. This post shows the impact of the Windows NVMe driver vs. a PVSCSI adapter.

TPC-C Results: The Database Workload Test

So, since DATABASES!!! will benefit significantly from the type of improvements NVMe will bring to the party, I set out into the lab and used my HammerDB test harness to drive some workloads. I created two TPCC databases: TPCC500NVME and TPCC500SCSI. The first database has four database files on the D:\ drive and its transaction log on the L:\ drive. Each of these is a Virtual Disk attached to the VM’s NVMe controller. The second database, TPCC500SCSI, has four database files on the P:\ and its transaction log on the Q:\ drives. Both of these are Virtual Disks attached to a single PVSCSI controller.

In the results below, you can see that the database attached to the NVMe controller is, on average, 25% faster than the database attached to the PVSCSI controller. That’s a significant performance increase. Imagine going back to your business and saying that your applications can run 25% faster by upgrading your operating system and plumbing in an NVMe connection between your VMware hosts and your FlashArray. You’d likely get support for that project. Here are some detailed metrics to review. These are the averages of three runs.

Overall TPC-C Performance Comparison

Metric NVMe Controller PVSCSI Controller Improvement
Average NOPM 98,230 78,270 +25.5%
Average TPM 228,348 182,191 +25.3%
Peak NOPM 100,890 79,828 +26.4%

Transaction Response Times

Transaction NVMe Avg (ms) NVMe p99 (ms) PVSCSI Avg (ms) PVSCSI p99 (ms) Improvement
NEWORD 12.1 32.7 15.1 41.4 20% faster
PAYMENT 2.8 8.7 3.3 11.5 15% faster
DELIVERY 17.0 50.0 21.8 61.3 22% faster
SLEV 11.1 54.6 19.2 70.2 42% faster
OSTAT 10.5 49.7 10.9 58.8 4% faster

DiskSpd: Synthetic I/O Testing

Next, let’s drive some concurrency into this environment. This is where NVMe technology truly excels, handling concurrent I/O threads. This is particularly beneficial for high-performance transactional workloads involving both reads, which we are highlighting below, and database writes, including transaction log writes. This advantage is further enhanced by the parallel log writer innovations in SQL Server.

When I ran DiskSpd with 8KB random reads (matching SQL Server’s data page size), the results show interesting behavior. Single-threaded NVMe and SCSI performance on the D: and P: drives are close. This is because a single thread doesn’t leverage NVMe’s parallelism. NVMe is faster in latency than SCSI, but in this test, throughput is higher. I run in a shared lab environment, and I suspect a noisy neighbor issue here.

At eight threads, NVMe starts to shine: NVMe D: reaches 227,383 IOPS while PVSCSI P: reaches 164,372 IOPS (~38% higher for NVMe). Now, since we’re experienced load testers reading this post, we don’t want to stop there. We want to continue piling on IO until we reach diminishing returns in terms of IOPs, and throughput and latency become unacceptable.

Driving further to 16 threads yields diminishing returns. Throughput drops for both controllers and average latency increases to 2.420 ms for the NVMe-attached disk and 3.271 ms for the SCSI-attached disk. We’ve reached a bottleneck somewhere in our stack, in my lab this is on the network side.

Let’s talk about guest CPU utilization during these tests. NVMe uses CPU more efficiently at scale, delivering more IOPS per CPU percentage point. Essentially, while there is an increase in CPU usage using NVMe, you’re literally doing more work in parallel and a smaller amount of wall-clock time.

DiskSpd Performance & Scaling (8KB Random Reads)

Controller Threads IOPS Throughput (MiB/s) Avg Latency (ms) CPU Usage (%)
NVMe D: 1 126,007 984.43 0.102 8.96
NVMe D: 8 227,383 1,776.43 1.119 20.91
NVMe D: 16 210,537 1,644.82 2.420 24.04
PVSCSI P: 1 148,158 1,157.48 0.180 11.15
PVSCSI P: 8 164,372 1,284.16 1.555 18.08
PVSCSI P: 16 149,343 1,166.74 3.271 20.31

To summarize this:

  • SCSI shows 18% better IOPS than NVMe at single-threaded operations (148K vs 126K), NVMe maintains better latency (0.102ms vs 0.180ms).
  • Once we move to 8 threads, NVMe pulls ahead with 38% better performance and continues to scale while SCSI plateaus.
  • Both controllers show degradation at 16 threads, in my lab this is due to network limitations. We certainly make this go faster (and I plan on doing just that.)

The Performance Impact Summary

Metric NVMe Advantage Impact for SQL Server
TPC-C Throughput +25% More transactions/second
Random Read IOPS (8 threads) +38% Faster table scans, index seeks
Peak Throughput +38% (1.78 GB/s vs 1.28 GB/s) Faster table scans, faster backup/restore operations
Multi-threaded Latency -28% (1.119ms vs 1.555ms) Improved user response times and log flushes

These results are from my test lab. Your mileage will undoubtedly vary. The configuration here isn’t intended to highlight a best practice or hero numbers, but to measure the impact of this change. NVMe brings parallelism all the way from the operating system through to your FlashArray, you should see a performance increase in your workload. Now get out in your lab and start testing.

Implementation Requirements

Here’s a listing of the tech needed to reproduce this environment.

Component Requirement Notes
Guest OS Windows Server 2025 For NVMe driver support
Hypervisor VMware vSphere 8.0 Update 1+ NVMe controller support
Datastore NVMe/TCP Attached Use to provision Virtual Disk
Storage Array NVMe/TCP capable e.g., FlashArray
Network 25GbE or higher For NVMe/TCP traffic
VM Configuration NVMe controller Instead of PVSCSI/LSI

The Bottom Line

The key insight is that NVMe’s real value isn’t just raw speed it’s the ability to scale with modern multi-threaded applications. Most modern SQL Server workloads are inherently parallel, making NVMe the better choice for most deployments.

Thanks to Matt Web, Alex Carver, Joe Houghes, Kenyon Hensler and David Stamen for helping me with the infrastructure side of this. Cheers fellas!