Load Testing Your Storage Subsystem with Diskspd – Part III

In our final post in our “Load Testing Your Storage Subsystem with Diskspd” series, we’re going to look at output from Diskspd and run some tests and interpret results. In our first post we showed how performance can vary based on access pattern and IO size. In our second post we showed how to design a test to highlight those performance characteristics and in this post we’ll execute those tests and review the results.  First let’s walk through the output from Diskspd, for now don’t focus on the actual results.

There are four major sections:

  • Test Parameters – here is the test’s parameters. Including the exact command line parameters executed. This is great for reproducing tests.
Command Line: diskspd.exe -d15 -o1 -F1 -b60K -h -s -L -w100 C:\TEST\iotest.dat

Input parameters:

        timespan:   1
        -------------
        duration: 15s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'C:\TEST\iotest.dat'
                think time: 0ms
                burst size: 0
                software and hardware write cache disabled
                performing write test
                block size: 61440
                number of outstanding I/O operations: 1
                thread stride size: 0
                IO priority: normal
  • CPU Usage – CPU usage for the test, recall if you are not using all your bandwidth, you may want to add threads. If your CPU burn is high, you may want to back off on the number of threads.
Results for timespan 1:
*******************************************************************************

actual test time:       15.00s
thread count:           1
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  30.10%|   1.04%|   29.06%|  69.89%
   1|   0.10%|   0.10%|    0.00%|  99.78%
-------------------------------------------
avg.|  15.10%|   0.57%|   14.53%|  84.84%
  • Performance – this is the meat of the test. Here we see bandwidth measured in MB/sec and latency measured in microseconds. With SSDs and today’s super fast storage I/O subsystems, you’ll likely need this level of accuracy. This is alone beats SQLIO in my opinion. I’m not much a fan of IOPs since those numbers require that you know the size of the IO for it to have any meaning. Check out Jeremiah Peschka’s article on this here. Remember, focus on minimizing latency and maximizing I/O please refer back Part I and Part II posts in this series for details.
Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00 |    0.000 |       N/A

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816
  • Histogram – this gives a great representation of how your test did over the whole run. In this example, 99% of the time our latency was less than 0.654ms…that’s pretty super.
%-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |        N/A |      0.059 |      0.059
   25th |        N/A |      0.163 |      0.163
   50th |        N/A |      0.193 |      0.193
   75th |        N/A |      0.218 |      0.218
   90th |        N/A |      0.258 |      0.258
   95th |        N/A |      0.312 |      0.312
   99th |        N/A |      0.654 |      0.654
3-nines |        N/A |     17.926 |     17.926
4-nines |        N/A |     18.906 |     18.906
5-nines |        N/A |    583.568 |    583.568
6-nines |        N/A |    583.568 |    583.568
7-nines |        N/A |    583.568 |    583.568
8-nines |        N/A |    583.568 |    583.568 
    max |        N/A |    583.568 |    583.568
Impact of I/O Access Patterns
  • Random - diskspd.exe -d15 -o32 -t2 -b64K -h -r -L -w0 C:\TEST\iotest.dat**
Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     16066543616 |       245156 |    1021.49 |   16343.84 |    1.896 |     0.286 | C:\TEST\iotest.dat (20GB)
     1 |     16231759872 |       247677 |    1031.99 |   16511.91 |    1.877 |     0.207 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:       32298303488 |       492833 |    2053.48 |   32855.75 |    1.886 |     0.250
In this test you can see the that there is high throughput and very low latency. This disk is a PCIe attached SSD, so it performs well with a random IO access pattern.
  • Sequential - diskspd.exe -d15 -o32 -t2 -b64K -h -s -L -w0 C:\TEST\iotest.dat
Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     16094724096 |       245586 |    1022.21 |   16355.35 |    1.895 |     0.260 | C:\TEST\iotest.dat (20GB)
     1 |     16263544832 |       248162 |    1032.93 |   16526.91 |    1.875 |     0.185 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:       32358268928 |       493748 |    2055.14 |   32882.26 |    1.885 |     0.225
In this test you can see that the sequential I/O pattern yields a similar performance profile to the random IO test on the SSD. Recall that an SSD does not have to move a disk head or rotate a platter. The access latency to any location on the drive has the same latency cost.  

Impact of I/O sizes

  • Tranaction log simulation   - diskspd.exe -d15 -o1 -t1 -b60K -h -s -L -w100 C:\TEST\iotest.dat
Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816
This test measures access latency of single thread with a very small data transfer, as you can see latency is very low at 0.289. This is expected on a low latency device such as a local attached SSD.   * **Backup operation simulation** diskspd.exe -d15 -o32 -t4 -b512K -h -s -L -w0 C:\TEST\iotest.dat
Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      8552185856 |        16312 |     543.17 |    1086.33 |   29.434 |    26.063 | C:\TEST\iotest.dat (20GB)
     1 |      8846311424 |        16873 |     561.85 |    1123.69 |   28.501 |    25.373 | C:\TEST\iotest.dat (20GB)
     2 |      8771338240 |        16730 |     557.09 |    1114.17 |   28.777 |    25.582 | C:\TEST\iotest.dat (20GB)
     3 |      8876720128 |        16931 |     563.78 |    1127.56 |   28.440 |    25.353 | C:\TEST\iotest.dat (20GB)
-----------------------------------------------------------------------------------------------------
total:       35046555648 |        66846 |    2225.88 |    4451.76 |   28.783 |    25.593

And finally, our test simulating reading data for a backup. The larger I/Os have a higher latency but also yield a higher transfer rate at 2,225MB/sec. In this series of post we introduced you into some theory on how drives access data, we presented tests on how to explore the performance profile of your disk subsystem and reviewed Diskspd output for those tests. This should give you the tools and ideas you need to load test your disk subsystem and ensure your SQL Servers will perform well when you put them into production!