Previously we looked at how the POSIX_FADVISE_DONTNEED hint influences the Linux page cache when doing IO via a filesystem. Here we take a look at two more filesystem hints POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL

Storage Performance and benchmarking
Previously we looked at how the POSIX_FADVISE_DONTNEED hint influences the Linux page cache when doing IO via a filesystem. Here we take a look at two more filesystem hints POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL

Sometimes we want to read from the Linux cache rather than the underlying device using fio. There are couple of gotchas that might trip you up. Thankfully fio provides the required work-arounds.
To get this to work as expected (reads are serviced from buffer cache) – the best way is to use the option invalidate=0 in the fio file.
If your storage system implements inline compression, performance results with small IO size random writes with time_based and runtime may be inflated with fio versions < 3.3 due to fio generating unexpectedly compressible data when using fio’s default data pattern. Although unintuitive, performance can often be increased by enabling compression especially if the bottleneck is on the storage media, replication or a combination of both.
Therefore if you are comparing performance results generated using fio version < 3.3 and fio >=3.3 the random write performance on the same storage platform my appear reduced with more recent fio versions.
fio-3.3 was released in December 2017 but older fio versions are still in use particularly on distributions with long term (LTS) support. For instance Ubuntu 16, which is supported until 2026 ships with fio-2.2.10
Download pre-compiled fio binary for Windows
This will create a 1GB file called fiofile on the F:\ Drive in Windows then read the file. Notice that the specification is “Driveletter” “Backslash” “Colon” “Filename”
In fio terms we are “escaping” the : which fio traditionally uses as a file separator.
[global]
bs=1024k
size=1G
time_based
runtime=30
rw=read
direct=1
iodepth=8
[job1]
filename=F\:fiofile
Continue reading
The Samsung SSD 970 EVO 500GB claims a sequential read bandwidth of 3400 MB/s this is a story of trying to achieve that number.
Continue readingI was recently asked to investigate why Nutanix storage was not as fast as a competing solution in a PoC environment. When I looked at the output from diskspd, the data didn’t quite make sense.
Continue readingCreating a mixed read/write workload with fio can be a bit confusing. Assume we want to create a fixed rate workload of 100 IOPS split 70:30 between reads and writes.

Specify the rate directly with rate_iops=<read-rate>,<write-rate> do not try to use rwmixread with rate_iops. For the example above use.
rate_iops=70,30
Additionally older versions of fio exhibit problems when using rate_poisson with rate_iops . fio version 3.7 that I was using did not exhibit the problem.
Continue readingThe parameters norandommap and randrepeat significantly change the way that repeated random IO workloads will be executed, and also can meaningfully change the results of an experiment due to the way that caching works on most storage system.
Continue reading
How to identify optane drives in linux OS using lspci.
Continue reading
Tips and tricks for using diskspd especially useful for those familar with tools like fio
Continue reading
How to ensure performance testing with diskspd is stressing the underlying storage devices, not the OS filesystem.
Continue reading
How to install and setup diskspd before starting your first performance tests and avoiding wrong results due to null byte issues.
Continue reading

To achieve the maximum throughput on a storage device, we will usually use a large IO size to maximize the amount of data is transferred per IO request. The idea is to make the ratio of data-transfers to IO requests as large as possible to reduce the CPU overhead of the actual IO request so we can get as close to the device bandwidth as possible. To take advantage of and pre-fetching, and to reduce the need for head movement in rotational devices, a sequential pattern is used.
For historical reasons, many storage testers will use a 1MB IO size for sequential testing. A typical fio command line might look like something this.
fio --name=read --bs=1m --direct=1 --filename=/dev/sdaContinue reading

The real-world achievable SSD performance will vary depending on factors like IO size, queue depth and even CPU clock speed. It’s useful to know what the SSD is capable of delivering in the actual environment in which it’s used. I always start by looking at the performance claimed by the manufacturer. I use these figures to bound what is achievable. In other words, treat the manufacturer specs as “this device will go no faster than…”.
Start by identifying the exact SSD type by using lsscsi. Note that the disks we are going to test are connected by ATA transport type, therefore the maximum queue depth that each device will support is 32.
# lsscsi
[1:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0
[2:0:0:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sda
[2:0:1:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdb
[2:0:2:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdc
[2:0:3:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/
The marketing name for these Samsung SSD’s is “SSD 850 EVO 2.5″ SATA III 1TB“
The spec sheet for this ssd claims the following performance characteristics.
| Workload (Max) | Spec | Measured |
| Sequential Read (QD=8) | 540 MB/s | 534 |
| Sequential Write (QD=8) | 520 MB/s | 515 |
| Read IOPS 4KB (QD=32) | 98,000 | 80,00 |
| Write IOPS 4KB (QD=32) | 90,000 | 67,000 |
A 2007 paper, that still has lots to say on the subject of benchmarking storage and filesystems. Primarily aimed at researchers and developers, but is relevant to anyone about to embark on a benchmarking effort.
The authors are clear on why benchmarks remain important:
“Ideally, users could test performance in their own settings using real work- loads. This transfers the responsibility of benchmarking from author to user. However, this is usually impractical because testing multiple systems is time consuming, especially in that exposing the system to real workloads implies learning how to configure the system properly, possibly migrating data and other settings to the new systems, as well as dealing with their respective bugs.”
We cannot expect end-users to be experts in benchmarking. It is out duty as experts to provide the tools (benchmarks) that enable users to make purchasing decisions without requiring years of benchmarking expertise.
Storage bus speeds with example storage endpoints.
| Bus | Lanes | End-Point | Theoretical Bandwidth (MB/s) | Note |
| SAS-3 | 1 | HBA <-> Single SATA Drive | 600 | SAS3<->SATA 6Gbit |
| SAS-3 | 1 | HBA <-> Single SAS Drive | 1200 | SAS3<->SAS3 12Gbit |
| SAS-3 | 4 | HBA <-> SAS/SATA Fanout | 4800 | 4 Lane HBA to Breakout (6 SSD)[2] |
| SAS-3 | 8 | HBA <-> SAS/SATA Fanout | 8400 | 8 Lane HBA to Breakout (12 SSD)[1] |
| PCIe-3 | 1 | N/A | 1000 | Single Lane PCIe3 |
| PCIe-3 | 4 | PCIe <-> SAS HBA or NVMe | 4000 | Enough for Single NVMe |
| PCIe-3 | 8 | PICe <-> SAS HBA or NVMe | 8000 | Enough for SAS-3 4 Lanes |
| PCIe-3 | 40 | PCIe Bus <-> Processor Socket | 40000 | Xeon Direct conect to PCIe Bus |
Notes
All figures here are the theoretical maximums for the busses using rough/easy calculations for bits/s<->bytes/s. Enough to figure out where the throughput bottlenecks are likely to be in a storage system.
[pdf-embedder url=”https://www.n0derunner.com/wp-content/uploads/2018/04/LSI-SAS-PCI-Bottlenecks.pdf” title=”LSI SAS PCI Bottlenecks”]
Intel Processor & PCI connections
We have started seeing misaligned partitions on Linux guests runnning certain HDFS distributions. How these partitions became mis-aligned is a bit of a mystery, because the only way I know how to do this on Linux is to create a partition using old DOS format like this (using -c=dos and -u=cylinders) Continue reading
Often we are presented with a vCenter screenshot, and an observation that there are “high latency spikes”. In the example, the response time is indeed quite high – around 80ms. Continue reading
Today I used fio to create some compressible data to test on my Nutanix nodes. I ended up using the following fio params to get what I wanted.
buffer_compress_percentage=50 refill_buffers buffer_pattern=0xdeadbeef
Much of this is well explained in the README for latest version of fio.
Also NOTE Older versions of fio do not support many of the fancy data creation flags, but will not alert you to the fact that fio is ignoring them. I spent quite a bit of time wondering why my data was not compressed, until I downloaded and compiled the latest fio.