Performance

Designing a Storage Load Test for SQL Server

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

Scaling SQL Server 2025 Vector Search with Load-Balanced Ollama Embeddings

SQL Server 2025 introduces native support for vector data types and external AI models. This opens up new scenarios for semantic search and AI-driven experiences directly in the database. But as with any external service integration, performance and scalability are immediate concerns, especially when generating embeddings at scale.

https://github.com/nocentino/ollama-lb-sql

Problem: Bottlenecks in Embedding Generation

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Automated SQL Server Benchmarking with HammerDB and Docker: A Complete Testing Framework

I’m excited to announce the release of a new open-source project that fully automates HammerDB benchmarking for SQL Server using Docker. If you’ve ever needed to run TPC-C or TPC-H benchmarks multiple times, you know how time-consuming the manual setup can be. This project removes the hassle and gets you up and running a single command: ./loadtest.sh.

Why I Built This

In my work, I frequently benchmark SQL Server configurations, whether I’m comparing versions, testing new hardware, or validating performance tuning changes. Setting up HammerDB manually each time became a significant time bottleneck (see what I did there! ;). I needed an automated solution that would work consistently across different environments and reduce the time required to get test results.

Monitoring SQL Server Performance with the Pure Storage FlashArray OpenMetrics Exporter

I’m excited to share a new open-source project I’ve been working on that combines two of my favorite areas: SQL Server and Pure Storage FlashArray performance monitoring. If you’ve been following my blog, you know I am passionate about creating tools that bridge the gap between database platforms and storage infrastructure.

SQL Server performance troubleshooting has always been a unique challenge, especially when it comes to understanding the complete I/O path from the database to storage. Traditionally, database administrators (DBAs) and storage administrators have used separate monitoring tools, which makes it difficult to correlate performance issues across the entire stack. Literally two different perspectives of their performance worlds.

Writing a Hello World Go Container Web Application

In this blog post, I will show you how to build a hello world container-based web application in the go programming language. The reason I want to do this is because I need a very small container image to do some testing in Kubernetes. I’ll also highlight some of the pitfalls I ran into to hopefully have you some time in your learnings.

Let’s build and test it locally first

Before you build a container-based application, you need an application. So let’s go ahead and build a simple hello world app in go, but running on our local system as a traditionally compiled program. I want to make sure my application works before I move onto the container build process. You’ll need to install the go programming language development tools. On a Mac, you can do that with brew install go. For other operating systems, check out the install page.

Understanding SQL Server IO Size

This blog post shows you how NTFS stores data, what the NTFS Allocation Unit means, and how SQL Server performs IOs of variable size.

How NTFS Stores Data on Disk

A Master File Table (MFT) is the data structure that describes files and directories on NTFS. In Figure 1, you can see an MTF record has several sections describing the metadata about the file and pointers to blocks that make up the file. A block, also referred to as a cluster in Windows, is an abstraction over one or more physical structures (sectors or pages depending on the media) presented by the underlying disk. A block/cluster is also the atomic allocation unit from a file system and has a configurable size. On NTFS, this is referred to as the NTFS Allocation Unit Size and is a configurable attribute of the file system. By default, it is 4KB and can be as large as 2MB. Since a block is a unit of allocation, if a file is between 1 byte and the file system’s allocation unit size, it will take up exactly one block/cluster on the file system. As the file grows, more blocks/clusters are allocated to represent the file. The MFT data structure tracks which blocks make up a file. The block allocator of the file system will try to ensure blocks are physically adjacent on the disk and groups them together in runs.

Measuring SQL Server File Latency

This post is a reference post for retrieving IO statistics for data and log files in SQL Server. We’ll look at where we can find IO statistics in SQL Server, query it to produce meaningful metrics, and discuss some key points when interpreting this data.

The Source DMF

The primary source for file latency data is the dynamic management function sys.dm_io_virtual_file_stats. The data in this DMF is per file. The query below joins with sys.master_files.

Container Limits and SQL Server

Limits in Containers

Docker gives you the ability to control a container’s access to CPU, Memory, and network and disk IO using resource constraints, sometimes called Limits. You define limits as parameters when creating containers. In its default configuration, a container will have no resource constraints for accessing resources of the host operating system. This post will look at how to configure resource constraints in Docker and look at how SQL Server sees the resources when CPU and Memory resource constraints are in place.

Memory Settings for Running SQL Server in Kubernetes

People often ask me what’s the number one thing to look out for when running SQL Server on Kubernetes…the answer is memory settings. In this post, we’re going to dig into why you need to configure resource limits in your SQL Server’s Pod Spec when running SQL Server workloads in Kubernetes. I’m running these demos in Azure Kubernetes Service (AKS), but these concepts apply to any SQL Server environment running in Kubernetes.

Data Persistency and Advanced SQL Server Disk Topologies in Kubernetes

When working with SQL Server in containers and Kubernetes storage is a key concept. In this post, we’re going to walk through how to deploy SQL Server in Kubernetes with Persistent Volumes for the system and user databases.

One of the key principals of Kubernetes is the ephemerality of Pods. No Pod is every redeployed, a completely new Pod is created. If a Pod dies, for whatever reason, a new Pod is created in its place there is no continuity in the state of that Pod. The newly created Pod will go back to the initial state of the container image defined in the Pod’s spec. This is very valuable for stateless workloads, not so much for stateful workloads like SQL Server.