Using fio to read from Linux buffer-cache

Sometimes we want to read from the Linux cache rather than the underlying device using fio. There are couple of gotchas that might trip you up. Thankfully fio provides the required work-arounds.

TL;DR

To get this to work as expected (reads are serviced from buffer cache) – the best way is to use the option invalidate=0 in the fio file.

Longer Version

By default, when using a filesystem/file rather than a raw devicefio opens the file using POSIX_FADV_DONTNEED . The actual implementation of the hint is detailed here which states that cached pages will be freed for the given file-descriptor and byte range. It is not clear to me, whether issuing the flag at open() time affects Linux’s algorithm to keep pages as they come into the cache.

Anyhow, fio uses POSIX_FADV_DONTNEED both at the time the file is initially opened AND whenever the file has been fully read. So if using a time_based workload – or in fact running a non time_based workload on the same file multiple times – you will never see cached access/cached performance by default.

Trace

In a strace, expect to see something like this, where 6 is the file desriptor and 104857600 is the total size of the file. In other words invalidate any cached pages associated with descriptor 6 (the file being used by the fio workload)

fadvise64(6, 0, 104857600, POSIX_FADV_DONTNEED)

What to do if you WANT cached access/performance

fadvise_hint may not do what you expect

Skimming the fio manpage will lead you to a helpful sounding option fadvise_hint which is a boolean and you might expect that setting it to 0 will supress all calls to fadvise. Unfortunately that’s not what happens. The specific use of the hint is to toggle RANDOM or SEQUENTIAL hints to the filesystem. This is outlined in the manpage but is easy to miss when skimming and trying to debug weird behavior

invalidate=0 is your friend

The unlikely named “invalidate” flag will suppress the POSIX_FADVISE_DONTNEED call against the file-descriptor both at “open” time and after the entire file has been read. In other words, fio wont issue POSIX_FADVISE_DONTNEED at all. The use of invalidate here is reflecting that fio invalidates the cached pages by way of using fadvise(). Using this method – the pages of the file will stay in buffer cache even across multiple invocations of fio – until the pages are purged by some sort of cache pressure.

norandommap also sort of works

If you use norandommap then what happens is that fio will issue POSIX_FADVISE_DONTNEED when the file is opened (once) but will not issue further fadadvise calls during the run. When using a random-map (which is the default) fio will issue a POSIX_FADVISE_DONTNEEDeach time it cycles through the randommap. Using this method you can measure how caching affects the workload over time (using a long-running fio workload) – but fio will invalidate the cache each time fio is invoked – so you will always start with a cold cache.

Summary

  • By default fio will issue POSIX_FADVISE_DONTNEED when the file is opened AND when the entire file has been read (either by doing a sequential read, OR a random-read with a randommap
  • If you use invalidate=0 then fio will never issue POSIX_FADVISE_DONTNEED
  • If you use a random-read with a randommap then fio will issue POSIX_FADVISE_DONTNEED once when the file is opened AND each time the randommap is exhausted (the entire file has been read) – this means that with a time_based workload fio may issue many fadvisecalls . You can alter this by using norandommap
  • If you use a sequential read with time_based – fio will issue POSIX_FADVISE_DONTNEED each time the end of the file is reached (using norandommap doesn’t make sense here – so use invalidate=0 instead.

Leave a Comment