Performance

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

SQL Server 2025 introduces native support for vector data types and external AI models. This opens up new scenarios for semantic search and AI-driven experiences directly in the database. But as with any external service integration, performance and scalability are immediate concerns, especially when generating embeddings at scale.

https://github.com/nocentino/ollama-lb-sql

Problem: Bottlenecks in Embedding Generation

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

I’m excited to announce the release of a new open-source project that fully automates HammerDB benchmarking for SQL Server using Docker. If you’ve ever needed to run TPC-C or TPC-H benchmarks multiple times, you know how time-consuming the manual setup can be. This project removes the hassle and gets you up and running a single command: ./loadtest.sh.

Why I Built This

In my work, I frequently benchmark SQL Server configurations, whether I’m comparing versions, testing new hardware, or validating performance tuning changes. Setting up HammerDB manually each time became a significant time bottleneck (see what I did there! ;). I needed an automated solution that would work consistently across different environments and reduce the time required to get test results.

I’m excited to share a new open-source project I’ve been working on that combines two of my favorite areas: SQL Server and Pure Storage FlashArray performance monitoring. If you’ve been following my blog, you know I am passionate about creating tools that bridge the gap between database platforms and storage infrastructure.

SQL Server performance troubleshooting has always been a unique challenge, especially when it comes to understanding the complete I/O path from the database to storage. Traditionally, database administrators (DBAs) and storage administrators have used separate monitoring tools, which makes it difficult to correlate performance issues across the entire stack. Literally two different perspectives of their performance worlds.

This blog post shows you how NTFS stores data, what the NTFS Allocation Unit means, and how SQL Server performs IOs of variable size.

How NTFS Stores Data on Disk

A Master File Table (MFT) is the data structure that describes files and directories on NTFS. In Figure 1, you can see an MTF record has several sections describing the metadata about the file and pointers to blocks that make up the file. A block, also referred to as a cluster in Windows, is an abstraction over one or more physical structures (sectors or pages depending on the media) presented by the underlying disk. A block/cluster is also the atomic allocation unit from a file system and has a configurable size. On NTFS, this is referred to as the NTFS Allocation Unit Size and is a configurable attribute of the file system. By default, it is 4KB and can be as large as 2MB. Since a block is a unit of allocation, if a file is between 1 byte and the file system’s allocation unit size, it will take up exactly one block/cluster on the file system. As the file grows, more blocks/clusters are allocated to represent the file. The MFT data structure tracks which blocks make up a file. The block allocator of the file system will try to ensure blocks are physically adjacent on the disk and groups them together in runs.

This post is a reference post for retrieving IO statistics for data and log files in SQL Server. We’ll look at where we can find IO statistics in SQL Server, query it to produce meaningful metrics, and discuss some key points when interpreting this data.

The Source DMF

The primary source for file latency data is the dynamic management function sys.dm_io_virtual_file_stats. The data in this DMF is per file. The query below joins with sys.master_files.

Limits in Containers

Docker gives you the ability to control a container’s access to CPU, Memory, and network and disk IO using resource constraints, sometimes called Limits. You define limits as parameters when creating containers. In its default configuration, a container will have no resource constraints for accessing resources of the host operating system. This post will look at how to configure resource constraints in Docker and look at how SQL Server sees the resources when CPU and Memory resource constraints are in place.

People often ask me what’s the number one thing to look out for when running SQL Server on Kubernetes…the answer is memory settings. In this post, we’re going to dig into why you need to configure resource limits in your SQL Server’s Pod Spec when running SQL Server workloads in Kubernetes. I’m running these demos in Azure Kubernetes Service (AKS), but these concepts apply to any SQL Server environment running in Kubernetes.

When working with SQL Server in containers and Kubernetes storage is a key concept. In this post, we’re going to walk through how to deploy SQL Server in Kubernetes with Persistent Volumes for the system and user databases.

One of the key principals of Kubernetes is the ephemerality of Pods. No Pod is every redeployed, a completely new Pod is created. If a Pod dies, for whatever reason, a new Pod is created in its place there is no continuity in the state of that Pod. The newly created Pod will go back to the initial state of the container image defined in the Pod’s spec. This is very valuable for stateless workloads, not so much for stateful workloads like SQL Server.

In this blog post we’re going to revisit how SQL Server on Linux responds to external memory pressure. This is a very long post, and it ends with me not knowing exactly what’s going on…but the journey is pretty fun…let’s go!

On Windows-based SQL Server systems we’ve become accustomed to the OS signaling to SQL Server that there’s a memory shortage. When signaled, SQL Server will kindly start shrinking it’s memory caches, including the buffer pool, to maintain overall system stability and usability. Well that story is a little different in SQL Server on Linux…last year I wrote a similar post for SQL Server 2017 RTM and how it reacted to external memory pressure, check that out here! That was quite a dramatic story, you can literally cause SQL Serve to swap nearly its entire process address space out to disk! Now, let’s look and see how SQL Server on Linux responds to external memory pressure in SQL Server 2019 CTP2.

Performance

Designing a Storage Load Test for SQL Server

Scaling SQL Server 2025 Vector Search with Load-Balanced Ollama Embeddings

Problem: Bottlenecks in Embedding Generation

Automated SQL Server Benchmarking with HammerDB and Docker: A Complete Testing Framework

Why I Built This

Monitoring SQL Server Performance with the Pure Storage FlashArray OpenMetrics Exporter

Understanding SQL Server IO Size

How NTFS Stores Data on Disk

Measuring SQL Server File Latency

The Source DMF

Container Limits and SQL Server

Limits in Containers

Memory Settings for Running SQL Server in Kubernetes

Data Persistency and Advanced SQL Server Disk Topologies in Kubernetes

SQL Server on Linux – External Memory Pressure with 2019 CTP2