Performance

Load balancing Ollama instances with Nginx to scale embedding generation for SQL Server 2025 vector search, achieving 6x performance improvement.

I’m excited to announce the release of a new open-source project that fully automates HammerDB benchmarking for SQL Server using Docker. If you’ve ever needed to run TPC-C or TPC-H benchmarks multiple times, you know how time-consuming the manual setup can be. This project removes the hassle and gets you up and running a single command: ./loadtest.sh.

Why I Built This

In my work, I frequently benchmark SQL Server configurations, whether I’m comparing versions, testing new hardware, or validating performance tuning changes. Setting up HammerDB manually each time became a significant time bottleneck (see what I did there! ;). I needed an automated solution that would work consistently across different environments and reduce the time required to get test results.

I’m excited to share a new open-source project I’ve been working on that combines two of my favorite areas: SQL Server and Pure Storage FlashArray performance monitoring. If you’ve been following my blog, you know I am passionate about creating tools that bridge the gap between database platforms and storage infrastructure.

SQL Server performance troubleshooting has always been a unique challenge, especially when it comes to understanding the complete I/O path from the database to storage. Traditionally, database administrators (DBAs) and storage administrators have used separate monitoring tools, which makes it difficult to correlate performance issues across the entire stack. Literally two different perspectives of their performance worlds.

In this blog post, I will show you how to build a hello world container-based web application in the go programming language. The reason I want to do this is because I need a very small container image to do some testing in Kubernetes. I’ll also highlight some of the pitfalls I ran into to hopefully have you some time in your learnings.

Let’s build and test it locally first

Before you build a container-based application, you need an application. So let’s go ahead and build a simple hello world app in go, but running on our local system as a traditionally compiled program. I want to make sure my application works before I move onto the container build process. You’ll need to install the go programming language development tools. On a Mac, you can do that with brew install go. For other operating systems, check out the install page.

This blog post shows you how NTFS stores data, what the NTFS Allocation Unit means, and how SQL Server performs IOs of variable size.

How NTFS Stores Data on Disk

A Master File Table (MFT) is the data structure that describes files and directories on NTFS. In Figure 1, you can see an MTF record has several sections describing the metadata about the file and pointers to blocks that make up the file. A block, also referred to as a cluster in Windows, is an abstraction over one or more physical structures (sectors or pages depending on the media) presented by the underlying disk. A block/cluster is also the atomic allocation unit from a file system and has a configurable size. On NTFS, this is referred to as the NTFS Allocation Unit Size and is a configurable attribute of the file system. By default, it is 4KB and can be as large as 2MB. Since a block is a unit of allocation, if a file is between 1 byte and the file system’s allocation unit size, it will take up exactly one block/cluster on the file system. As the file grows, more blocks/clusters are allocated to represent the file. The MFT data structure tracks which blocks make up a file. The block allocator of the file system will try to ensure blocks are physically adjacent on the disk and groups them together in runs.

This post is a reference post for retrieving IO statistics for data and log files in SQL Server. We’ll look at where we can find IO statistics in SQL Server, query it to produce meaningful metrics, and discuss some key points when interpreting this data.

The Source DMF

The primary source for file latency data is the dynamic management function sys.dm_io_virtual_file_stats. The data in this DMF is per file. The query below joins with sys.master_files.

Limits in Containers

Docker gives you the ability to control a container’s access to CPU, Memory, and network and disk IO using resource constraints, sometimes called Limits. You define limits as parameters when creating containers. In its default configuration, a container will have no resource constraints for accessing resources of the host operating system. This post will look at how to configure resource constraints in Docker and look at how SQL Server sees the resources when CPU and Memory resource constraints are in place.

People often ask me what’s the number one thing to look out for when running SQL Server on Kubernetes…the answer is memory settings. In this post, we’re going to dig into why you need to configure resource limits in your SQL Server’s Pod Spec when running SQL Server workloads in Kubernetes. I’m running these demos in Azure Kubernetes Service (AKS), but these concepts apply to any SQL Server environment running in Kubernetes.

When working with SQL Server in containers and Kubernetes storage is a key concept. In this post, we’re going to walk through how to deploy SQL Server in Kubernetes with Persistent Volumes for the system and user databases.

One of the key principals of Kubernetes is the ephemerality of Pods. No Pod is every redeployed, a completely new Pod is created. If a Pod dies, for whatever reason, a new Pod is created in its place there is no continuity in the state of that Pod. The newly created Pod will go back to the initial state of the container image defined in the Pod’s spec. This is very valuable for stateless workloads, not so much for stateful workloads like SQL Server.

In this blog post we’re going to revisit how SQL Server on Linux responds to external memory pressure. This is a very long post, and it ends with me not knowing exactly what’s going on…but the journey is pretty fun…let’s go!

On Windows-based SQL Server systems we’ve become accustomed to the OS signaling to SQL Server that there’s a memory shortage. When signaled, SQL Server will kindly start shrinking it’s memory caches, including the buffer pool, to maintain overall system stability and usability. Well that story is a little different in SQL Server on Linux…last year I wrote a similar post for SQL Server 2017 RTM and how it reacted to external memory pressure, check that out here! That was quite a dramatic story, you can literally cause SQL Serve to swap nearly its entire process address space out to disk! Now, let’s look and see how SQL Server on Linux responds to external memory pressure in SQL Server 2019 CTP2.

Scaling SQL Server 2025 Vector Search with Load-Balanced Ollama Embeddings

Automated SQL Server Benchmarking with HammerDB and Docker: A Complete Testing Framework

Why I Built This

Monitoring SQL Server Performance with the Pure Storage FlashArray OpenMetrics Exporter

Writing a Hello World Go Container Web Application

Let’s build and test it locally first

Understanding SQL Server IO Size

How NTFS Stores Data on Disk

Measuring SQL Server File Latency

The Source DMF

Container Limits and SQL Server

Limits in Containers

Memory Settings for Running SQL Server in Kubernetes

Data Persistency and Advanced SQL Server Disk Topologies in Kubernetes

SQL Server on Linux – External Memory Pressure with 2019 CTP2