Using fio to read from Linux buffer-cache

Sometimes we want to read from the Linux cache rather than the underlying device using fio. There are couple of gotchas that might trip you up. Thankfully fio provides the required work-arounds.

TL;DR

To get this to work as expected (reads are serviced from buffer cache) – the best way is to use the option invalidate=0 in the fio file.

Continue reading

fio versions < 3.3 may show inflated random write performance

TL;DR

If your storage system implements inline compression, performance results with small IO size random writes with time_based and runtime may be inflated with fio versions < 3.3 due to fio generating unexpectedly compressible data when using fio’s default data pattern. Although unintuitive, performance can often be increased by enabling compression especially if the bottleneck is on the storage media, replication or a combination of both.

fio 2.8.1 vs fio 3.33 data patterns

Therefore if you are comparing performance results generated using fio version < 3.3 and fio >=3.3 the random write performance on the same storage platform my appear reduced with more recent fio versions.

fio-3.3 was released in December 2017 but older fio versions are still in use particularly on distributions with long term (LTS) support. For instance Ubuntu 16, which is supported until 2026 ships with fio-2.2.10

Continue reading

Specifying Drive letters with fio for Windows.

fio on Windows

Download pre-compiled fio binary for Windows

Example fio windows file, single drive

This will create a 1GB file called fiofile on the F:\ Drive in Windows then read the file.  Notice that the specification is “Driveletter” “Backslash” “Colon” “Filename”

In fio terms we are “escaping” the : which fio traditionally uses as a file separator.

[global]
bs=1024k
size=1G
time_based
runtime=30
rw=read
direct=1
iodepth=8

[job1]
filename=F\:fiofile 
Continue reading

Using rwmixread and rate_iops in fio

Creating a mixed read/write workload with fio can be a bit confusing. Assume we want to create a fixed rate workload of 100 IOPS split 70:30 between reads and writes.

Don’t mix rwmixread and rate_iops
TL;DR

Specify the rate directly with rate_iops=<read-rate>,<write-rate> do not try to use rwmixread with rate_iops. For the example above use.

rate_iops=70,30 

Additionally older versions of fio exhibit problems when using rate_poisson with rate_iops . fio version 3.7 that I was using did not exhibit the problem.

Continue reading

Why does my SSD not issue 1MB IO’s?

First things First

https://commons.wikimedia.org/wiki/File:CDC9762-smd-drive.jpg
CDC 9762 SMD disk drive from 1974

Why do we tend to use 1MB IO sizes for throughput benchmarking?

To achieve the maximum throughput on a storage device, we will usually use a large IO size to maximize the amount of data is transferred per IO request. The idea is to make the ratio of data-transfers to IO requests as large as possible to reduce the CPU overhead of the actual IO request so we can get as close to the device bandwidth as possible. To take advantage of and pre-fetching, and to reduce the need for head movement in rotational devices, a sequential pattern is used.

For historical reasons, many storage testers will use a 1MB IO size for sequential testing. A typical fio command line might look like something this.

fio --name=read --bs=1m --direct=1 --filename=/dev/sda
Continue reading

How to identify SSD types and measure performance.

Thomas Springer / CC0
Generic SSD Internal Layout

The real-world achievable SSD performance will vary depending on factors like IO size, queue depth and even CPU clock speed. It’s useful to know what the SSD is capable of delivering in the actual environment in which it’s used. I always start by looking at the performance claimed by the manufacturer. I use these figures to bound what is achievable. In other words, treat the manufacturer specs as “this device will go no faster than…”.


Identify SSD

Start by identifying the exact SSD type by using lsscsi. Note that the disks we are going to test are connected by ATA transport type, therefore the maximum queue depth that each device will support is 32.

# lsscsi 
[1:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0
[2:0:0:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sda
[2:0:1:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdb
[2:0:2:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/sdc
[2:0:3:0] disk ATA SAMSUNG MZ7LM1T9 404Q /dev/

The marketing name for these Samsung SSD’s is “SSD 850 EVO 2.5″ SATA III 1TB

Identify device specs

The spec sheet for this ssd claims the following performance characteristics.

Workload (Max)SpecMeasured
Sequential Read (QD=8)540 MB/s534
Sequential Write (QD=8)520 MB/s515
Read IOPS 4KB (QD=32)98,00080,00
Write IOPS 4KB (QD=32)90,00067,000
Continue reading

Paper: A Nine year study of filesystem and storage benchmarking

A 2007 paper, that still has lots to say on the subject of benchmarking storage and filesystems. Primarily aimed at researchers and developers, but is relevant to anyone about to embark on a benchmarking effort.

  • Use a mix of macro and micro benchmarks
  • Understand what you are testing, cached results are fine – as long as that is what you had intended.

The authors are clear on why benchmarks remain important:

Ideally, users could test performance in their own settings using real work- loads. This transfers the responsibility of benchmarking from author to user. However, this is usually impractical because testing multiple systems is time consuming, especially in that exposing the system to real workloads implies learning how to configure the system properly, possibly migrating data and other settings to the new systems, as well as dealing with their respective bugs.”

We cannot expect end-users  to be experts in benchmarking. It is out duty as experts  to provide the tools (benchmarks) that enable users to make purchasing decisions without requiring years of benchmarking expertise.

Storage Bus Speeds 2018

Storage bus speeds with example storage endpoints.

Bus Lanes End-Point Theoretical Bandwidth (MB/s) Note
SAS-3 1 HBA <-> Single SATA Drive 600 SAS3<->SATA 6Gbit
SAS-3 1 HBA <-> Single SAS Drive 1200 SAS3<->SAS3 12Gbit
SAS-3 4 HBA <-> SAS/SATA Fanout 4800 4 Lane HBA to Breakout (6 SSD)[2]
SAS-3 8 HBA <-> SAS/SATA Fanout 8400 8 Lane HBA to Breakout (12 SSD)[1]
PCIe-3 1 N/A 1000 Single Lane PCIe3
PCIe-3 4 PCIe <-> SAS HBA or NVMe 4000 Enough for Single NVMe
PCIe-3 8 PICe <-> SAS HBA or NVMe 8000 Enough for SAS-3 4 Lanes
PCIe-3 40 PCIe Bus <-> Processor Socket 40000 Xeon Direct conect to PCIe Bus

 

Notes


All figures here are the theoretical maximums for the busses using rough/easy calculations for bits/s<->bytes/s. Enough to figure out where the throughput bottlenecks are likely to be in a storage system.

  1. SATA devices contain a single SAS/SATA port (connection), and even when they are connected to a SAS3 HBA, the SATA protocol limits each SSD device to ~600MB/s (single port, 6Gbit)
  2. SAS devices may be dual ported (two connections to the device from the HBA(s)) – each with a 12Gbit connection giving a potential bandwidth of 2x12Gbit == 2.4Gbyte/s (roughly) per SSD device.
  3. An NVMe device directly attached to the PCIe bus has access to a bandwidth of 4GB/s by using 4 PCIe lanes – or 8GB/s using 8 PCIe lanes.  On current Xeon processors, a single socket attaches to 40 PCIe lanes directly (see diagram below) for a total bandwidth of 40GB/s per socket.

  • I first started down the road of finally coming to grips with all the different busses and lane types after reading this excellent LSI paper.  I omitted the SAS-2 figures from this article since modern systems use SAS-3 exclusively.

[pdf-embedder url=”https://www.n0derunner.com/wp-content/uploads/2018/04/LSI-SAS-PCI-Bottlenecks.pdf” title=”LSI SAS PCI Bottlenecks”]

 

Intel Processor & PCI connections



Creating compressible data with fio.

binary-code-507785_1280

Today I used fio to create some compressible data to test on my Nutanix nodes.  I ended up using the following fio params to get what I wanted.

 

buffer_compress_percentage=50
refill_buffers
buffer_pattern=0xdeadbeef
  • buffer_compress_percentage does what you’d expect and specifies how compressible the data is
  • refill_buffers Is required to make the above compress percentage do what you’d expect in the large.  IOW, I want the entire file to be compressible by the buffer_compress_percentage amount
  • buffer_pattern  This is a big one.  Without setting this pattern, fio will use Null bytes to achieve compressibility, and Nutanix like many other storage vendors will suppress runs of Zero’s and so the data reduction will mostly be from zero suppression rather than from compression.

Much of this is well explained in the README for latest version of fio.

Also NOTE  Older versions of fio do not support many of the fancy data creation flags, but will not alert you to the fact that fio is ignoring them. I spent quite a bit of time wondering why my data was not compressed, until I downloaded and compiled the latest fio.