How to install and setup diskspd before starting your first performance tests and avoiding wrong results due to null byte issues.
Installing Disk-Speed (diskspd).
- Get diskpd binary from Microsft : http://aka.ms/diskspd
- Manual is here: https://github.com/Microsoft/diskspd/wiki
- Extract Zip file, copy the
amd64
binary toC:\windows\system32
to put it in the Windows OS$PATH
- Open Terminal/Command Prompt
STOP! Before you run any IO performance tests with diskspd
PROBLEM: NULL bytes
By default, when diskspd creates a file it is a file full of NULL bytes. Many storage systems (at least NetApp and Nutanix that I know of) will optimize the layout NULL byte files. This means that test results from NULL byte files will not reflect the performance of real applications that read or write actual data.
FIX: Overwrite the disk or file immediately after creation in diskspd
To avoid overly optimistic results caused by reading/writing NULL
bytes, first create the file, then write a randomized data pattern to the file before doing any testing.
Example
1. Create a file called testfile1.dat
using diskspd -c. e.g. for a 32G file on drive D:
diskspd.exe -c32G D:\testfile1.dat
This will create a 32G file full of NULL bytes
2. Then overwrite with a random pattern
diskspd.exe -w100 -Zr D:\testfile1.dat
Note: diskspd
will run for 5s of warmup and 10s of runtime only by default – so if you have a large file, you will need to run the diskspd write workload for more than 15 seconds. This is very different from fio
which will write the entire file unless told to run for a specified time (using the time_based
and runtime
parameters).
diskspd
will tell you the number of bytes written during the test, so you can get a good idea if the whole disk/file was written to. Another good test is to see how compressable the datafile is. A datafile written entirely with random data will show very little compression whereas sections of NULL
bytes will compress a lot. For instance if your 32G datafile compresses to 16G – probably half the file is random data and half is NULL
bytes.
Available write patterns.
diskspd provides a variety of options when writing out datafiles. I strongly recommend using random patterns for testing underlying storage, unless you are specifically trying to understand how the storage handles particular patterns.
For purposes of demonstration, create 3 files of 2GB in size.
diskspd.exe -c2G F:\testfile2 diskspd.exe -c2G F:\testfile3 diskspd.exe -c2G F:\testfile4
Option 1 – Repeating pattern
Use -w100 (write 100%) with no additional flags to generate a repeating pattern.
diskspd.exe -w100 F:\testfile2.dat
Option 2 – Write NULL (Zero byte) pattern
Use -w100 with -Z to generate NULL bytes
diskspd.exe -w100 -Z F:\testfile3.dat
Option 3 – Write Random pattern (Recommended)
Use -w100 with –Zr to generate a random pattern
diskspd.exe -w100 -Zr - F:\testfile4.dat
Here are the resulting patterns
testfile2.dat (-w100) == repeating pattern. Writes the values 00, 01..0xFE,0xFF then repeats.
testfile3.dat (-w100 –Z) == NULL bytes
testfile4.dat (-w100 –Zr) == Random pattern
A Simple test of random-ness in data
A really effective test of data “random-ness” is to see how much a file can be compressed. The built in compression tool in Windows File-Manager is good enough. We see that the repeating pattern compresses almost as well as the NULL byte pattern. So for an intelligent storage platform – although the special-case optimization for NULL bytes will be defeated, the storage engine will probably compress the file internally. This is good in most use-cases but not if you’re trying to test the underlying storage with a particular file size.
testfile2.dat (repeating patterns) 2GB -> 97KB
testfile3.dat (NULL bytes) 2GB ->74KB
testfile4.dat (random) 2GB -> 2GB. (Note the “compressed” file is actually a bit larger!)