Setting up SQL Server S3 Object Storage Integration using MinIO with Docker Compose (Updated for SQL Server 2025)

Update for SQL Server 2025:
This post and the GitHub repo have been updated for SQL Server 2025 RC1 and Ubuntu 24.04.
New in SQL Server 2025: You no longer need to install the PolyBase service to interact with Parquet files in S3. Previously, with SQL Server 2022, you had to build a custom container or manually install PolyBase. Now, S3 object integration and Parquet support work out-of-the-box!


In this blog post, I’ve implemented two example environments for using SQL Server’s S3 object integration. One for backup and restore to S3-compatible object storage and the other for data virtualization using PolyBase connectivity to S3-compatible object storage. This work aims to get you up and running as quickly as possible to work with these new features. I implemented this in Docker Compose since that handles all the implementation and configuration steps for you. The complete code for this is available on my GitHub repo. I’m walking you through the implementation here in this post.

Scaling SQL Server 2025 Vector Search with Load-Balanced Ollama Embeddings

SQL Server 2025 introduces native support for vector data types and external AI models. This opens up new scenarios for semantic search and AI-driven experiences directly in the database. But as with any external service integration, performance and scalability are immediate concerns, especially when generating embeddings at scale.

https://github.com/nocentino/ollama-lb-sql

Problem: Bottlenecks in Embedding Generation

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Automated SQL Server Benchmarking with HammerDB and Docker: A Complete Testing Framework

I’m excited to announce the release of a new open-source project that fully automates HammerDB benchmarking for SQL Server using Docker. If you’ve ever needed to run TPC-C or TPC-H benchmarks multiple times, you know how time-consuming the manual setup can be. This project removes the hassle and gets you up and running a single command: ./loadtest.sh.

Why I Built This

In my work, I frequently benchmark SQL Server configurations, whether I’m comparing versions, testing new hardware, or validating performance tuning changes. Setting up HammerDB manually each time became a significant time bottleneck (see what I did there! ;). I needed an automated solution that would work consistently across different environments and reduce the time required to get test results.

Managing Enterprise Storage with Pure Storage Fusion in PowerShell - Building Storage Tiers

In modern IT environments, not all workloads require the same level of storage performance, protection, or cost. Some applications need high performance with aggressive data protection, while others are perfectly fine with lower performance in exchange for cost savings. This tiered approach to storage service delivery is fundamental to efficient infrastructure management.

In my previous post on Fusion, I took an application-centric approach, showing how to deploy SQL Servers using Fusion. Let’s switch gears now and learn how to define a storage service catalog. In this post, I’ll demonstrate how to build a complete storage service catalog using Pure Storage Fusion Presets, offering Bronze, Silver, and Gold tiers with optional replication. We’ll see how to leverage different array types (FlashArray //X and FlashArray //C) to optimize both performance and cost across your fleet.

Getting Started with Vector Search in SQL Server 2025 Using Ollama

Ollama SQL FastStart streamlines the deployment of SQL Server 2025 with integrated AI capabilities through a comprehensive Docker-based solution. This project delivers a production-ready environment combining SQL Server 2025, Ollama’s large language model services, and NGINX with full SSL support—all preconfigured to work together seamlessly.

I built this project to eliminate the complex configuration hurdles that typically slow down AI integration projects. Whether you’re a database professional wanting to explore SQL Server 2025’s new vector search capabilities or a developer looking to build AI-powered applications on familiar infrastructure, this solution provides everything you need in a single docker-compose file. The entire stack—including the complex certificate trust chain between SQL Server and the Ollama API—is automatically configured, allowing you to focus on building data driven AI applications rather than infrastructure setup.

Managing Enterprise Storage with Pure Storage Fusion in PowerShell

When managing storage infrastructure at scale, one of the most powerful approaches is treating related storage resources as cohesive Workloads rather than individual components. This becomes especially important when dealing with applications like SQL Server that have specific storage patterns and requirements and are often deployed at scale in a datacenter or cloud.

In this post, I’ll walk through a complete workflow for creating and managing application-specific storage Workloads using Pure Storage’s Fusion Fleet capability with PowerShell. We’ll see how we can define storage templates, called Presets, once and deploy them consistently across our entire Fleet of storage arrays.

Microsoft MVP 2025: Continuing the Data Platform Journey

I am honored to announce that I have been renewed as a Microsoft MVP for the ninth consecutive year, recognized in the Azure SQL and SQL Server technical areas under Data Platform. Thanks for this incredible journey that began in 2017.

Thank You

I want to thank Microsoft for this continued recognition. The MVP program has provided me with numerous opportunities to connect with brilliant minds worldwide, gain early access to cutting-edge technologies, and collaborate with product teams and engineering teams that contribute to the evolution of the data platforms we all rely on to maintain our customers’ most critical asset, data.

Build a Snapshot Backup Catalog in Pure Storage with SQL Server 2025’s Native REST API

I’m really excited to share some new functionality in SQL Server 2025 combined with some innovations in FlashArray’s REST API.

In this post, I’m going to show you how to build a snapshot backup catalog using FlashArray Protection Group Tags and orchestrating the work using SQL Server 2025’s new native REST integration.

With this solution, you will have the ability to query your snapshots by database name, creation time, instance name, or really any other interesting metadata that you want to add. Bridging the gap between snapshotting volumes and databases. And the best part? You can do this all without external tools or databases - just SQL Server and FlashArray.

Monitoring SQL Server Performance with the Pure Storage FlashArray OpenMetrics Exporter

I’m excited to share a new open-source project I’ve been working on that combines two of my favorite areas: SQL Server and Pure Storage FlashArray performance monitoring. If you’ve been following my blog, you know I am passionate about creating tools that bridge the gap between database platforms and storage infrastructure.

SQL Server performance troubleshooting has always been a unique challenge, especially when it comes to understanding the complete I/O path from the database to storage. Traditionally, database administrators (DBAs) and storage administrators have used separate monitoring tools, which makes it difficult to correlate performance issues across the entire stack. Literally two different perspectives of their performance worlds.

SQL Server 2025: Using ZSTD Compression for SQL Server Backups

SQL Server 2025 introduces a new compression algorithm called ZSTD (Zstandard), which can enhance database backup performance. Using ZSTD allows for greater control over backup performance, particularly concerning CPU usage and backup duration. I recently conducted some preliminary benchmarks comparing ZSTD at its three compression levels with the existing MS_XPRESS algorithm. The results are compelling and provide additional options for managing database backup performance effectively.

The Test Setup

I set up two types of tests on a 389.37GB database to highlight the performance characteristics of the backup compression algorithms: