Backing up to s3 Compatible Object Storage with SQL Server

Page content

Introducing S3 in SQL Server 2022

S3 compatible object storage integration is a new feature introduced in SQL Server 2022. There are two significant areas where SQL Server leverages this: backup and restore and data virtualization. This article will focus on getting started with using S3 compatible object storage for backups. Now let’s unpack that phrase ‘S3 compatible object storage’ a bit. AWS Simple Cloud Storage Service (S3) is a storage service AWS provides in their cloud. That platform’s REST API is available for others, including my company, Pure Storage, to build their own s3 compatible object storage platforms. And on at Pure Storage we have s3 available on our scale-out File and Object Platform FlashBlade. This means you can take advantage of s3 object storage anywhere you like, outside AWS’s cloud infrastructure.

So you might be thinking, why? Why use s3? s3 is a very popular object storage platform in and outside of AWS. Developers are using it to store data, and even my customer base when I was a consultant, DBAs are using s3 compatible object storage to store SQL Server backups. But, before this new integration in 2022, DBAs would have to backup to a share or local volume and use tools like awscli or rsynch to sync the data into their s3 buckets.

If you’re new to s3, a bucket is a kind of like a folder on a file system. It’s a place you can stick files.

Now, let’s talk about why s3. First up, performance. If you’re backing up to a single file server…the primary way to get more performance out of that file server is to scale the system up by adding more CPU and faster disks which often has upper limites or possibly even adding additional backup servers. That means you then need to track which instances are backing up to which servers…and that can get messy fast. Most implementations of the s3 API provide scale-out performance. What this means to you is you can add performance to your storage platform by adding additional compute nodes, each contributing CPU, Disk, and networking. As you scale the system out, this means you can run backups faster into the storage platform by the usual DBA techniques of increasing backup files and data files. On S3 this is all done under a single namespace…so you’ll have one place to look for your backups. Further, S3 compatible object storage platform provides native replication and core to a good recovery plan is getting your backups out of the primary data center as quickly as possible. With s3 compatible storage you can replicate to another S3 compatible object storage or AWS’s s3 storage.

Let’s now move on to see this in action.

Getting Started

You’ll need a few things to get going with using s3 compatible storage.

  • A bucket created on your s3 compatible object storage platform or AWS s3. Refer to your platform’s documentation to get this going.
  • You’ll need a username, sometimes called an Access Key ID, and its password, sometimes called a Secret Key ID
  • To perform both backups and restores, this user will need readwrite access to the bucket
  • And finally, your s3 compatible object storage needs a valid TLS Certificate

Creating a Credential

Once you have your bucket and authentication configured, the next thing to do is to create a CREDENTIAL. In the code below, we’re creating a credential in SQL Server. This contains the information needed to authenticate to our s3 compatible object storage. We’ll unpack this line by line.

First, CREATE CREDENTIAL [s3://] creates a credential with a name that has the URL to the bucket included. Later, when you execute a backup statement, the URL that you use to write the backup file(s) to will be used to locate the correct credentials for that URL by matching the URL defined the backup command to the one in the name of the credential. The database engine uses the most specific match when looking for a credential. So you can be more or less granular credentials and credentials if needed. Perhaps you want to use one credential for the entire s3 compatible object store, a credential for each bucket, or even credentials for nested buckets.

Next, WITH IDENTITY = 'S3 Access Key' this string must be set to this value when using s3.

And last, SECRET = 'anthony:nocentino; this is the username (Access Key ID) which is currently anthony and the password (Secret Key ID) is nocentino. Notice that there’s a colon as a delimiter. This means neither the username nor the password can have a colon in their values. So watch out for that.

WITH IDENTITY = 'S3 Access Key',
     SECRET = 'anthony:nocentino';

Running a Backup

With everything ready to go, a bucket created, permissions set, and a credential defined, let’s now go ahead and run a backup to our s3 compatible object storage. Let’s walk through that code.

First, we define the database we want to back up with BACKUP DATABASE TestDB1 .

Next, we tell the backup command where to put the backup file with TO URL = 's3://' Using this, if there’s more than one credential defined, the database engine can find the correct credential to use based off of URLs matching using the most specific match.

And to round things off, I’m adding WITH COMPRESSION to compress the data written into the backup file(s).

       TO URL = 's3://' 

Total execution time: 00:01:25.633
BACKUP DATABASE successfully processed 15418346 pages in 85.542 seconds (1408.148 MB/sec).
Processed 2 pages for database 'TestDB1', file 'TestDB1_log' on file 1.

And there, we have our first backup to s3 compatible object storage.

Working with “larger” Backup Files

In s3 object storage, a file is broken up into as many as 10,000 parts. In SQL Server, the each part’s size is based on the parameter MAXTRANSFERSIZE since this is the size of the write operation performed into the backup file. The default used for backups to s3 compatible storage is 10MB. So 10,000 * 10MB means the largest file size for a single file is about 100GB. And for many databases, that’s just not big enough. So what can you do…first you can use compression. That will get more of your data into a single file.

If you exceed the maximum file size, here’s the error that you’ll get:

Msg 3202, Level 16, State 1, Line 78
Write on 's3://' failed: 1117(The request could not be performed because of an I/O device error.)
Msg 3013, Level 16, State 1, Line 78
BACKUP DATABASE is terminating abnormally.

Second, you can increase MAXTRANSFERSIZE; the default is 10MB. Valid values are 5MB to 20MB. So if you max out MAXTRANSFERSIZE, your single file maximum size is just under 200GB.

The third knob you have to turn to help with larger backup sets is to increase the number of backup files by adding more URLs to the backup command. Each file has 10,000 parts * MAXTRANSFERSIZE * the number of URLs. So in the example below, each file can be up to 200GB, and there are two files. So we can have about 400GB of backup files. The maximum number of files is 64, so the largest single backup set you can have is just over 12TB. But remember, you can also use compression. So you can have a database of greater than 12TB in size in a backup set. One note to add here is using MAXTRANSFERSIZE requires that COMPRESSION be enabled for the backup.

TO URL = 's3://' ,
   URL = 's3://' 

I’m pretty excited about this feature. S3 object storage integration gives the DBA another storage option for writing backup data to. S3 compatible object storage provides interesting capabilities around performance and replication. In an upcoming post, we’re going to look at some performance tuning options for this and learn how we can drive some concurrency out of our s3 compatible object storage platforms.

Do you want to learn how to set up S3-compatible object storage integration using MinIO and containers? Check out this post here where I walk you through how to get off the ground fast using MinIO for object storage and SQL Server 2022!