Setting up SQL Server S3 Object Storage Integration using MinIO with Docker Compose (Updated for SQL Server 2025)

Update for SQL Server 2025:
This post and the GitHub repo have been updated for SQL Server 2025 RC1 and Ubuntu 24.04.
New in SQL Server 2025: You no longer need to install the PolyBase service to interact with Parquet files in S3. Previously, with SQL Server 2022, you had to build a custom container or manually install PolyBase. Now, S3 object integration and Parquet support work out-of-the-box!


In this blog post, I’ve implemented two example environments for using SQL Server’s S3 object integration. One for backup and restore to S3-compatible object storage and the other for data virtualization using PolyBase connectivity to S3-compatible object storage. This work aims to get you up and running as quickly as possible to work with these new features. I implemented this in Docker Compose since that handles all the implementation and configuration steps for you. The complete code for this is available on my GitHub repo. I’m walking you through the implementation here in this post.


What’s New

  • SQL Server 2025 RC1 and Ubuntu 24.04: The environments now use the latest SQL Server container image.
  • No PolyBase Installation Required for Parquet/S3: PolyBase is no longer required to be installed for Parquet/S3 integration in SQL Server 2025.
  • Improved Certificate Handling: The config service generates SSL certificates before other services start, ensuring proper sequencing and avoiding mount errors.
  • Backup Demo Script Improvements: The backup/demo.sh script now automates the full backup workflow, including environment startup, database creation, S3 credential creation, backup execution, and cleanup of generated certificates.
  • PolyBase Configuration: The demo SQL script now enables PolyBase and advanced options at the start, following best practices.
  • Consistent Naming: Service and file names have been updated for clarity and consistency.

Backup and Restore Test Environment

In the backup directory, you’ll find a script demo.sh to start the environment and perform a basic connectivity test using SQL Server backup. To start everything up, change into the backup directory and run:

How it Works

  1. Certificate Generation:
    The config service builds a container that generates the required SSL certificate and stores it in ./certs on the host.

  2. MinIO Service:
    The minio1 service starts a MinIO container, exposing ports 9000 (data) and 9001 (admin). It mounts the generated certificates for TLS.

  3. Bucket Creation:
    The createbucket service uses the MinIO client (mc) to create a user and a bucket (sqlbackups) for backup/restore testing.

  4. SQL Server Service:
    The sql1 service runs SQL Server, mounting the MinIO public certificate for trusted TLS connections.

  5. Demo Script:
    The demo.sh script:

    • Starts the Docker Compose environment
    • Waits for SQL Server to be ready and verifies connectivity
    • Creates a test database
    • Creates the S3 credential in SQL Server
    • Executes a backup to the S3 bucket
    • (Optionally) Cleans up generated certificates with rm -rf ./certs

Example Usage

In demo.sh run each of these line by line to walk through the demo:

Create a database:

CREATE DATABASE TESTDB1;

Create the S3 credential:

CREATE CREDENTIAL [s3://s3.example.com:9000/sqlbackups] WITH IDENTITY = 'S3 Access Key', SECRET = 'anthony:nocentino';

Backup to S3:

BACKUP DATABASE TestDB1 TO URL = 's3://s3.example.com:9000/sqlbackups/TestDB1.bak' WITH COMPRESSION, STATS = 10, FORMAT, INIT;

Cleanup:

docker compose down --rmi local --volumes
rm -rf ./certs

PolyBase and S3 Data Virtualization Environment

Up next is Data Virtualization. In this repo’s datavirtualization directory, there’s a script demo.sh and a demo SQL script demo.sql. This script has the commands you’ll need to start up the environment and do a basic connectivity test using PolyBase-based access to S3-compatible object storage.

What’s New for SQL Server 2025

  • No PolyBase Installation Required for Parquet/S3:
    You no longer need to install the PolyBase service to interact with Parquet files in S3. This is a major improvement over SQL Server 2022, where you had to build a custom container or manually install PolyBase to enable this functionality.
  • Simplified Configuration:
    The demo SQL script now enables PolyBase and advanced options at the start, following best practices.

How it Works

  1. Certificate Generation:
    The config service generates SSL certificates and places them in ./certs.

  2. MinIO Service:
    The s3.example.com service starts MinIO with the generated certificates.

  3. Bucket and Data Setup:
    The createbucket service creates a user, a bucket (sqldatavirt), and uploads a sample CSV file.

  4. SQL Server Service:
    The sql1 service runs SQL Server 2025 RC1, mounting the MinIO public certificate for TLS.

PolyBase Configuration

Before creating external data sources, PolyBase must be enabled. The demo.sql script now includes:

USE master;
GO
sp_configure 'show advanced options', 1;
RECONFIGURE;
sp_configure 'polybase enabled', 1;
RECONFIGURE;
GO

Example Usage

  • Create a database and master key
  • Create a database scoped credential for S3 access
  • Create an external data source referencing your S3 bucket
  • Use OPENROWSET or create external tables to query data in S3
  • Use CREATE EXTERNAL TABLE AS SELECT to create and read a parquet file in S3

See demo.sql for the full workflow.

Memory Requirements

If you encounter memory errors with PolyBase, increase Docker’s memory allocation to at least 4GB.


Wrap things up

When you’re done, you can use docker compose down --rmi local --volumes to clean up all the resources, images, network, and the volumes holding the database in the databases and MinIO.


References