Microsoft diskspd. Part 1 Preparing to test.

How to install and setup diskspd before starting your first performance tests and avoiding wrong results due to null byte issues.

Installing Disk-Speed (diskspd).

Get diskpd binary from Microsft : http://aka.ms/diskspd
Manual is here: https://github.com/Microsoft/diskspd/wiki
Extract Zip file, copy the amd64 binary to C:\windows\system32 to put it in the Windows OS $PATH
Open Terminal/Command Prompt

STOP! Before you run any IO performance tests with `diskspd`

PROBLEM: NULL bytes

By default, when diskspd creates a file it is a file full of NULL bytes. Many storage systems (at least NetApp and Nutanix that I know of) will optimize the layout NULL byte files. This means that test results from NULL byte files will not reflect the performance of real applications that read or write actual data.

FIX: Overwrite the disk or file immediately after creation in diskspd

To avoid overly optimistic results caused by reading/writing NULL bytes, first create the file, then write a randomized data pattern to the file before doing any testing.

Example

1. Create a file called `testfile1.dat` using diskspd -c. e.g. for a 32G file on drive D:

diskspd.exe -c32G D:\testfile1.dat

This will create a 32G file full of NULL bytes

Default write command sends NULL bytes to disk

2. Then overwrite with a random pattern

diskspd.exe -w100 -Zr D:\testfile1.dat

Note: diskspd will run for 5s of warmup and 10s of runtime only by default – so if you have a large file, you will need to run the diskspd write workload for more than 15 seconds. This is very different from fio which will write the entire file unless told to run for a specified time (using the time_based and runtime parameters).

diskspd will tell you the number of bytes written during the test, so you can get a good idea if the whole disk/file was written to. Another good test is to see how compressable the datafile is. A datafile written entirely with random data will show very little compression whereas sections of NULL bytes will compress a lot. For instance if your 32G datafile compresses to 16G – probably half the file is random data and half is NULL bytes.

same file after being overwritten with diskspd -w100 -Zr

Available write patterns.

diskspd provides a variety of options when writing out datafiles. I strongly recommend using random patterns for testing underlying storage, unless you are specifically trying to understand how the storage handles particular patterns.

For purposes of demonstration, create 3 files of 2GB in size.

diskspd.exe -c2G F:\testfile2
diskspd.exe -c2G F:\testfile3
diskspd.exe -c2G F:\testfile4

Option 1 – Repeating pattern

Use -w100 (write 100%) with no additional flags to generate a repeating pattern.

diskspd.exe -w100 F:\testfile2.dat

Option 2 – Write NULL (Zero byte) pattern

Use -w100 with -Z to generate NULL bytes

diskspd.exe -w100 -Z F:\testfile3.dat

Option 3 – Write Random pattern (Recommended)

Use -w100 with –Zr to generate a random pattern

diskspd.exe -w100 -Zr - F:\testfile4.dat

Here are the resulting patterns

testfile2.dat (-w100) == repeating pattern. Writes the values 00, 01..0xFE,0xFF then repeats.
testfile3.dat (-w100 –Z) == NULL bytes
testfile4.dat (-w100 –Zr) == Random pattern

A Simple test of random-ness in data

A really effective test of data “random-ness” is to see how much a file can be compressed. The built in compression tool in Windows File-Manager is good enough. We see that the repeating pattern compresses almost as well as the NULL byte pattern. So for an intelligent storage platform – although the special-case optimization for NULL bytes will be defeated, the storage engine will probably compress the file internally. This is good in most use-cases but not if you’re trying to test the underlying storage with a particular file size.

testfile2.dat (repeating patterns) 2GB -> 97KB
testfile3.dat (NULL bytes) 2GB ->74KB
testfile4.dat (random) 2GB -> 2GB. (Note the “compressed” file is actually a bit larger!)

Installing Disk-Speed (diskspd).

STOP! Before you run any IO performance tests with diskspd