Internals

Reading SQL Server File Headers with DBCC FILEHEADER

I’ve been doing a deep dive into SQL Server on-disk structures lately, and one of my favorite rabbit holes is revisiting Paul Randal’s series on file header pages. If you haven’t read it, go do that now. It covers what file header pages are, what they contain, and what happens when they corrupt. This post takes that concept and runs with it. I’ll use DBCC FILEHEADER to read the file header of every user database file on a server and answer a question that comes up more than you’d think: can you determine which files belong together as a database purely from the file header, without querying sys.databases?

Understanding SQL Server IO Size

This blog post shows you how NTFS stores data, what the NTFS Allocation Unit means, and how SQL Server performs IOs of variable size.

How NTFS Stores Data on Disk

A Master File Table (MFT) is the data structure that describes files and directories on NTFS. In Figure 1, you can see an MTF record has several sections describing the metadata about the file and pointers to blocks that make up the file. A block, also referred to as a cluster in Windows, is an abstraction over one or more physical structures (sectors or pages depending on the media) presented by the underlying disk. A block/cluster is also the atomic allocation unit from a file system and has a configurable size. On NTFS, this is referred to as the NTFS Allocation Unit Size and is a configurable attribute of the file system. By default, it is 4KB and can be as large as 2MB. Since a block is a unit of allocation, if a file is between 1 byte and the file system’s allocation unit size, it will take up exactly one block/cluster on the file system. As the file grows, more blocks/clusters are allocated to represent the file. The MFT data structure tracks which blocks make up a file. The block allocator of the file system will try to ensure blocks are physically adjacent on the disk and groups them together in runs.