Sometimes we want to read from the Linux cache rather than the underlying device using fio. There are couple of gotchas that might trip you up. Thankfully fio provides the required work-arounds.
TL;DR
To get this to work as expected (reads are serviced from buffer cache) – the best way is to use the option invalidate=0
in the fio file.
Longer Version
By default, when using a filesystem/file rather than a raw devicefio
opens the file using POSIX_FADV_DONTNEED
. The actual implementation of the hint is detailed here which states that cached pages will be freed for the given file-descriptor and byte range. It is not clear to me, whether issuing the flag at open() time affects Linux’s algorithm to keep pages as they come into the cache.
Anyhow, fio
uses POSIX_FADV_DONTNEED
both at the time the file is initially opened AND whenever the file has been fully read. So if using a time_based
workload – or in fact running a non time_based
workload on the same file multiple times – you will never see cached access/cached performance by default.
Trace
In a strace, expect to see something like this, where 6 is the file desriptor and 104857600 is the total size of the file. In other words invalidate any cached pages associated with descriptor 6 (the file being used by the fio workload)
fadvise64(6, 0, 104857600, POSIX_FADV_DONTNEED)
What to do if you WANT cached access/performance
fadvise_hint may not do what you expect
Skimming the fio
manpage will lead you to a helpful sounding option fadvise_hint
which is a boolean and you might expect that setting it to 0
will supress all calls to fadvise
. Unfortunately that’s not what happens. The specific use of the hint is to toggle RANDOM or SEQUENTIAL hints to the filesystem. This is outlined in the manpage but is easy to miss when skimming and trying to debug weird behavior
invalidate=0 is your friend
The unlikely named “invalidate” flag will suppress the POSIX_FADVISE_DONTNEED
call against the file-descriptor both at “open” time and after the entire file has been read. In other words, fio wont issue POSIX_FADVISE_DONTNEED
at all. The use of invalidate here is reflecting that fio invalidates the cached pages by way of using fadvise(). Using this method – the pages of the file will stay in buffer cache even across multiple invocations of fio – until the pages are purged by some sort of cache pressure.
norandommap also sort of works
If you use norandommap
then what happens is that fio will issue POSIX_FADVISE_DONTNEED
when the file is opened (once) but will not issue further fadadvise
calls during the run. When using a random-map (which is the default) fio will issue a POSIX_FADVISE_DONTNEED
each time it cycles through the randommap. Using this method you can measure how caching affects the workload over time (using a long-running fio workload) – but fio will invalidate the cache each time fio is invoked – so you will always start with a cold cache.
Summary
- By default fio will issue
POSIX_FADVISE_DONTNEED
when the file is opened AND when the entire file has been read (either by doing a sequential read, OR a random-read with a randommap - If you use
invalidate=0
then fio will never issuePOSIX_FADVISE_DONTNEED
- If you use a random-read with a randommap then fio will issue
POSIX_FADVISE_DONTNEED
once when the file is opened AND each time the randommap is exhausted (the entire file has been read) – this means that with atime_based
workload fio may issue manyfadvise
calls . You can alter this by usingnorandommap
- If you use a sequential read with time_based – fio will issue
POSIX_FADVISE_DONTNEED
each time the end of the file is reached (using norandommap doesn’t make sense here – so useinvalidate=0
instead.