Non-Blocking File Copy in C#

Why .NET async await file copy is a lot more CPU consuming than synchronous File.Copy() call?

File.OpenRead(sourceFileName) is equivalent to new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read) which is in turn equivalent to public FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, false) which is to say with false for async I/O. The equivalent is true of the File.OpenWrite.

As such any XXXAsync operations won't use async I/O but will fake it using thread-pool threads.

So it gets none of the benefit of async I/O and wastes at least one thread. You've got an extra thread blocking on I/O which was what you wanted to avoid. I'd generally expect async on its own to perform slightly slower than sync (async generally sacrifices one-off speed for better scalability) but I'd definitely expect this to do little better, if at all, than wrapping the whole thing in Task.Run().

I'd still not expect it to be quite as bad, but maybe anti-malware is being worried by writing to an .exe.

You would hopefully fare better copying a non-exe and with asynchronous streams.

Asynchronous file and folder copy - C#

You're still doing a bunch of IO on the main thread: Directory.Exist, File.Exist, etc... You probably want to avoid doing the entire thing on the main thread.

So, an easy solution would be to add a new method:

private void copyEverythingAsync(string source, string target)
{
Task.Run(()=> copyEverything(source, target));
}

And then remove the async/await from the copyEverything method.

This will move the operation onto a new thread from the ThreadPool and not block your main thread.

.net 4.0 asynchronous file copy

You can use a BackGroundWorker for this type of thing:

BackGroundWorker worker = new BackGroundWorker();
worker.DoWork += myWorkDelegate; // method, delegate or lambda that does the heavy work
worker.RunWorkerCompleted += myCompletedDelegate; //method delegate or lambda to execute when DoWork has finished

worker.RunWorkerAsync();

How is none blocking IO for regular files is implemented in .Net on Linux?

It's worth pointing that there are multiple contexts at play here.

The Linux operating system

From Non-Blocking descriptors:

By default, read on any descriptor blocks if there’s no data
available. The same applies to write or send. This applies to
operations on most descriptors except disk files, since writes to disk
never happen directly but via the kernel buffer cache as a proxy. The
only time when writes to disk happen synchronously is when the O_SYNC
flag was specified when opening the disk file.

Any descriptor (pipes, FIFOs, sockets, terminals, pseudo-terminals,
and some other types of devices) can be put in the nonblocking mode.
When a descriptor is set in nonblocking mode, an I/O system call on
that descriptor will return immediately, even if that request can’t be
immediately completed (and will therefore result in the process being
blocked otherwise). The return value can be either of the following:

  • an error: when the operation cannot be completed at all
  • a partial count: when the input or output operation can be partially completed
  • the entire result: when the I/O operation could be fully completed

As explained above, the Non-Blocking descriptors will prevent pipes (or sockets, or...) from blocking continuously. They weren't meant to be used with disk files, however, because no matter if you want to read an entire file, or just a part of it, the data is there. It's not going to get there in the future, so you can start processing it right away.

Quoting your linked post:

Regular files are always readable and they are also always writeable.
This is clearly stated in the relevant POSIX specifications. I cannot
stress this enough. Putting a regular file in non-blocking has
ABSOLUTELY no effects other than changing one bit in the file flags.

Reading from a regular file might take a long time. For instance, if
it is located on a busy disk, the I/O scheduler might take so much
time that the user will notice that the application is frozen.

Nevertheless, non-blocking mode will not fix it. It will simply not
work. Checking a file for readability or writeability always succeeds
immediately. If the system needs time to perform the I/O operation, it
will put the task in non-interruptible sleep from the read or write
system call. In other words, if you can assume that a file descriptor
refers to a regular file, do not waste your time (or worse, other
people's time) in implementing non-blocking I/O.

The only safe way to read data from or write data to a regular file
while not blocking a task... consists of not performing the operation,
not in that particular task anyway. Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose
name starts with aio_). Whether you like it or not, and even if you
think multiple threads suck, there are no other options.

The .NET runtime

Implements the async/await pattern to unblock the main event loop while I/O is being performed. As mentioned above:

Concretely, you need to create a separate thread (or process), or use
asynchronous I/O (functions whose name starts with aio_). Whether you
like it or not, and even if you think multiple threads suck, there are
no other options.

The .NET threadpool will spawn additional processes as needed (ref why is .NET spawning multiple processes on Linux). So, ideally, when the .NET File.ReadAsync(...) or File.WriteAsync(...) overloads are called, the current thread (from the threadpool) will initiate the I/O operation and will then give up control, freeing it to do other work. But before it does, a continuation is placed on the I/O operation. So when the I/O device signals the operation has finished, the threadpool scheduler knows the next free thread can pick up the continuation.

To be sure, this is all about responsiveness. All code that requires the I/O to complete, will still have to wait. Although, it won't "block" the application.

Back to OS

The thread giving up control, which eventually leads to it being freed up, can be achieved on Windows:

https://learn.microsoft.com/en-us/troubleshoot/windows/win32/asynchronous-disk-io-synchronous

Asynchronous I/O hasn't been a part of Linux (for very long), the flow we have here is described at:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#unix

Unix-like systems don’t expose async file IO APIs (except of the new
io_uring which we talk about later). Anytime user asks FileStream to
perform async file IO operation, a synchronous IO operation is being
scheduled to Thread Pool. Once it’s dequeued, the blocking operation
is performed on a dedicated thread.

Similar flow is suggested by Python's asyncio implementation:

asyncio does not support asynchronous operations on the filesystem.
Even if files are opened with O_NONBLOCK, read and write will block.

...

The Linux kernel provides asynchronous operations on the filesystem
(aio), but it requires a library and it doesn't scale with many
concurrent operations. See aio.

...

For now, the workaround is to use aiofiles that uses threads to handle
files.

Closing thoughts

The concept behind Linux' Non-Blocking descriptor (and its polling mechanism) is not what makes async I/O tick on Windows.

As mentioned by @Damien_The_Unbeliever there's a relatively new io_uring Linux kernel interface that allows continuation flow similar to the one on Windows. However, the following links confirm this is not yet implemented on .NET6:

  • https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#whats-next
  • https://github.com/dotnet/runtime/issues/12650

Issues with File Copy Asynchronously

I was able to figure it out.

The reason why it couldn't set the file time is because it was still within the stream.

I simply moved the method outside of the write stream and that resolved the problem.

       foreach (var file in dir.EnumerateFiles())
{
string temppath = Path.Combine(destDirName, file.Name);
using (FileStream reader = new FileStream(file.FullName, FileMode.Open, FileAccess.Read))
{
using (FileStream writer = new FileStream(temppath, FileMode.Create, FileAccess.ReadWrite))
{
await reader.CopyToAsync(writer);

}
File.SetLastWriteTime(temppath, file.LastWriteTime);
}
}

How to track the File.Copy() in C#?

I have an excellent idea. Why can you just use the windows Copy Dialog?
Just add a reference:

Microsoft.VisualBasic

And use the code:

try
{
FileSystem.CopyFile(source_path, destination_path, UIOption.AllDialogs);
}

catch (Exception ext)
{
MessageBox.Show(ext.Message);
}

I guess this will help.



Related Topics



Leave a reply



Submit