Why Is Dd with the 'Direct' (O_Direct) Flag So Dramatically Faster

Why writes with O_DIRECT and O_SYNC still causing io merge?

On Linux, doing direct I/O doesn't mean "do this exact I/O" - it is a hint to bypass Linux's page cache. At the time of writing the open man page says this about O_DIRECT:

Try to minimize cache effects of the I/O to and from this file.

This means things like the Linux I/O scheduler are still free to do their thing with regard to merges, reorderings (your use of fio's sync=1 is what stops the reordering) etc with O_DIRECT I/O.

Additionally, if you are doing I/O to a file in a filesystem, then it is legitimate for said filesystem to ignore the O_DIRECT hint and fallback to buffered I/O.

See the different parameters of nomerges in https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt for how to teach the scheduler to avoid merging/rearranging but note that you can't control the splitting of a request that is too large.

Having said all the above, it doesn't look like all that much I/O merging (as given by wrqm/s) is happening in your scenario but there's still something a bit strange. The avgrq-sz is 9.36 and since that value is in 512 byte sectors, we get 4792.32 bytes as the average request size being submitted down to the disk. This value is fairly close to the 4096 byte block size fio is using. Since you can't do non-sector sized I/O to a disk and assuming the disk's block size is 512 bytes this suggests a merge of 4KBytes + 512 bytes (I assume the rest is noise) but since it's an average there could be something doing large(r) I/O at the same time fio is doing small I/O and the average is just coming out to something in-between. Because I/O is happening to a file in a filesystem, this might be explained by filesystem metadata being updated...

Can mmap and O_DIRECT be used together?

I think it would not have a lot of sense.

O_DIRECT means that all I/O should be reflected in storage, as soon as possible (without cache).

The mapped pages is a copy of storage (file) in the memory. Reflecting every read from and write to memory would have to do some I/O and this would be huge performance hit.

O_DIRECT flag not working

O_DIRECT flag uses DMA internally and in my kernel, DMA is not enabled. This is the basic reason why it was working on my Desktop pc but not functional on the Board.. They had different kernels in them one in which DMA was enabled and other in which DMA was not enabled..

Linux async (io_submit) write v/s normal (buffered) write

Copying a buffer into the kernel is not necessarily instantaneous.

First the kernel needs to find a free page. If there is none (which is fairly likely under heavy disk-write pressure), it has to decide to evict one. If it decides to evict a dirty page (instead of evicting your process for instance), it will have to actually write it before it can use that page.

there's a related issue in linux when saturating writing to a slow drive, the page cache fills up with dirty pages backed by a slow drive. Whenever the kernel needs a page, for any reason, it takes a long time to acquire one and the whole system freezes as a result.

The size of each individual write is less relevant than the write pressure of the system. If you have a million small writes already queued up, this may be the one that has to block.

Regarding whether the allocation lives on the stack or the heap is also less relevant. If you want efficient allocation of blocks to write, you can use a dedicated pool allocator (from the heap) and not pay for the general purpose heap allocator.

aio_write() gets around this by not copying the buffer into the kernel at all, it may even be DMAd straight out of your buffer (given the alignment requirements), which means you're likely to save a copy as well.

Why Is Dd with the 'Direct' (O_Direct) Flag So Dramatically Faster

Why writes with O_DIRECT and O_SYNC still causing io merge?

Can mmap and O_DIRECT be used together?

O_DIRECT flag not working

Linux async (io_submit) write v/s normal (buffered) write

Related Topics

Leave a reply