Why File.Readalllinesasync() Blocks the UI Thread

Why File.ReadAllLinesAsync() blocks the UI thread?

Sadly currently (.NET 5) the built-in asynchronous APIs for accessing the filesystem are not implemented consistently according to Microsoft's own recommendations about how asynchronous methods are expected to behave.

An asynchronous method that is based on TAP can do a small amount of work synchronously, such as validating arguments and initiating the asynchronous operation, before it returns the resulting task. Synchronous work should be kept to the minimum so the asynchronous method can return quickly.

Methods like StreamReader.ReadToEndAsync do not behave this way, and instead block the current thread for a considerable amount of time before returning an incomplete Task. For example in an older experiment of mine with reading a 6MB file from my SSD, this method blocked the calling thread for 120 msec, returning a Task that was then completed after only 20 msec. My suggestion is to avoid using the asynchronous filesystem APIs from GUI applications, and use instead the synchronous APIs wrapped in Task.Run.

var lines = await Task.Run(() => File.ReadAllLines(@"D:\temp.txt"));

Update: Here are some experimental results with File.ReadAllLinesAsync:

var stopwatch = Stopwatch.StartNew();
var task = File.ReadAllLinesAsync(@"C:\6MBfile.txt");
var duration1 = stopwatch.ElapsedMilliseconds;
bool isCompleted = task.IsCompleted;
stopwatch.Restart();
var lines = await task;
var duration2 = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Create: {duration1:#,0} msec, Task.IsCompleted: {isCompleted}");
Console.WriteLine($"Await: {duration2:#,0} msec, Lines: {lines.Length:#,0}");

Output:

Create: 450 msec, Task.IsCompleted: False
Await: 5 msec, Lines: 204,000

The method File.ReadAllLinesAsync blocked the current thread for 450 msec, and the returned task completed after 5 msec. These measurements are consistent after multiple runs.

.NET Core 3.1.3, C# 8, Console App, Release build (no debugger attached), Windows 10, SSD Toshiba OCZ Arc 100 240GB


.NET 6 update. The same test on the same hardware using .NET 6:

Create: 19 msec, Task.IsCompleted: False
Await: 366 msec, Lines: 204,000

The implementation of the asynchronous filesystem APIs has been improved on .NET 6, but still they are far behind the synchronous APIs (they are about
2 times slower, and not totally asynchronous). So my suggestion to
use the synchronous APIs wrapped in Task.Run still holds.

Why does async IO block in C#?

You are probably targeting a .NET version older than .NET 6. In these old versions the file-system APIs were not implemented efficiently, and were not even truly asynchronous. Things have been improved in .NET 6, but still the synchronous file-system APIs are more performant than their asynchronous counterparts. Your problem can be solved simply by switching from this:

var json = await File.ReadAllTextAsync(file.FullName);

to this:

var json = await Task.Run(() => File.ReadAllText(file.FullName));

If you want to get fancy, you could also solve the problem in the UI layer, by using a custom LINQ operator like this:

public static async IAsyncEnumerable<T> OnThreadPool<T>(
this IAsyncEnumerable<T> source,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var enumerator = await Task.Run(() => source
.GetAsyncEnumerator(cancellationToken)).ConfigureAwait(false);
try
{
while (true)
{
var (moved, current) = await Task.Run(async () =>
{
if (await enumerator.MoveNextAsync())
return (true, enumerator.Current);
else
return (false, default);
}).ConfigureAwait(false);
if (!moved) break;
yield return current;
}
}
finally
{
await Task.Run(async () => await enumerator
.DisposeAsync()).ConfigureAwait(false);
}
}

This operator offloads to the ThreadPool all the operations associated with enumerating an IAsyncEnumerable<T>. It can be used like this:

await foreach (var video in _videoEndpoints.QueryOnTags(query).OnThreadPool())
QueryResult.Add(_mapperService.Map(video));

Async task is blocking the UI

This part will definitely block your UI thread:

for(int i = 0; i < 10; i++)
{
Thread.Sleep(1000);
}

Not sure why it's there, but if you replace it with an async version, it should work:

for(int i = 0; i < 10; i++)
{
await Task.Delay(1000);
}

What is the use of async/await method calls?

To expand a bit on Stephen Clearys comment

There are two main use cases for asynchronous code.

  1. For UI applications you do not want to freeze your UI while doing some slow IO-operation, like reading a large file, or doing something process intensive.
  2. For server applications you do not want to consume a thread while waiting for slow IO operations. Many servers are fairly simple front ends for a database. If you have 1000 concurrent queries and are using synchronous code, you might require 1000 threads that are just waiting. Threads uses some resources, like memory, and this becomes wasteful when trying to service a large number of concurrent users.

The older styles of doing writing asynchronous code was a bit difficult to use, since you need to manually keep track of everything that is needed to be done after the IO operation completed. Async/awaits largely avoids this by making the compiler do the difficult parts, letting you write code that looks like regular sequential code.

A criticism against async/await is that sometimes it is better to use synchronous methods. So you might be required to write two versions of essentially the same code.

EF Core - Async functions are blocking UI

Unfortunately, the actual implementation of these methods seem to leave something to be desired:

Why would an EF query with ToListAsync hang in a WPF application?

As a workaround, to keep your UI responsive, you could execute the synchronous version on a background thread:

private async void Button_Click(object sender, RoutedEventArgs e)
{
await Task.Run(() =>
{
using var ctx = new eWMSContext()
{
var a = ctx.TJobLines.ToList();
}
});
}

Then you don't have to rely on the implementation of ToListAsync being non-blocking.

Async file I/O overhead in C#

The built-in asynchronous filesystem APIs are currently broken, and you are advised to avoid them. Not only they are much slower than their synchronous counterparts, but they are not even truly asynchronous. The .NET 6 will come with an improved FileStream implementation, so in a few months this may no longer be an issue.

What you are trying to achieve is called task-parallelism, where two or more heterogeneous operations are running concurrently and independently from each other. It's an advanced technique and it requires specialized tools. The most common type of parallelism is the so called data-parallelism, where the same type of operation is running in parallel on a list of homogeneous data, and it's commonly implemented using the Parallel class or the PLINQ library.

To achieve task-parallelism the most readily available tool is the TPL Dataflow library, which is built-in the .NET Core / .NET 5 platforms, and you only need to install a package if you are targeting the .NET Framework. This library allows you to create a pipeline consisting of linked components that are called "blocks" (TransformBlock, ActionBlock, BatchBlock etc), where each block acts as an independent processor with its own input and output queues. You feed the pipeline with data, and the data flows from block to block through the pipeline, while being processed along the way. You Complete the first block in the pipeline to signal that no more input data will ever be available, and then await the Completion of the last block to make your code wait until all the work has been done. Here is an example:

private async void Button1_Click(object sender, EventArgs e)
{
Button1.Enabled = false;
var fileBlock = new TransformManyBlock<string, IList<string>>(filePath =>
{
return File.ReadLines(filePath).Buffer(10);
});

var deserializeBlock = new TransformBlock<IList<string>, MyObject[]>(lines =>
{
return lines.Select(line => Deserialize(line)).ToArray();
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2 // Let's assume that Deserialize is parallelizable
});

var persistBlock = new TransformBlock<MyObject[], MyObject[]>(async objects =>
{
foreach (MyObject obj in objects) await PersistToDbAsync(obj);
return objects;
});

var displayBlock = new ActionBlock<MyObject[]>(objects =>
{
foreach (MyObject obj in objects) TextBox1.AppendText($"{obj}\r\n");
}, new ExecutionDataflowBlockOptions()
{
TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext()
// Make sure that the delegate will be invoked on the UI thread
});

fileBlock.LinkTo(deserializeBlock,
new DataflowLinkOptions { PropagateCompletion = true });
deserializeBlock.LinkTo(persistBlock,
new DataflowLinkOptions { PropagateCompletion = true });
persistBlock.LinkTo(displayBlock,
new DataflowLinkOptions { PropagateCompletion = true });

foreach (var filePath in Directory.GetFiles(@"C:\Data"))
await fileBlock.SendAsync(filePath);

fileBlock.Complete();
await displayBlock.Completion;
MessageBox.Show("Done");
Button1.Enabled = true;
}

The data passed through the pipeline should be chunky. If each unit of work is too lightweight, you should batch them in arrays or lists, otherwise the overhead of moving lots of tiny data around is going to outweigh the benefits of parallelism. That's the reason for using the Buffer LINQ operator (from the System.Interactive package) in the above example. The .NET 6 will come with a new Chunk LINQ operator, offering the same functionality.

How to Async Files.ReadAllLines and await for results?

UPDATE: Async versions of File.ReadAll[Lines|Bytes|Text], File.AppendAll[Lines|Text] and File.WriteAll[Lines|Bytes|Text] have now been merged into .NET Core and shipped with .NET Core 2.0. They are also included in .NET Standard 2.1.

Using Task.Run, which essentially is a wrapper for Task.Factory.StartNew, for asynchronous wrappers is a code smell.

If you don't want to waste a CPU thread by using a blocking function, you should await a truly asynchronous IO method, StreamReader.ReadToEndAsync, like this:

using (var reader = File.OpenText("Words.txt"))
{
var fileText = await reader.ReadToEndAsync();
// Do something with fileText...
}

This will get the whole file as a string instead of a List<string>. If you need lines instead, you could easily split the string afterwards, like this:

using (var reader = File.OpenText("Words.txt"))
{
var fileText = await reader.ReadToEndAsync();
return fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
}

EDIT: Here are some methods to achieve the same code as File.ReadAllLines, but in a truly asynchronous manner. The code is based on the implementation of File.ReadAllLines itself:

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Threading.Tasks;

public static class FileEx
{
/// <summary>
/// This is the same default buffer size as
/// <see cref="StreamReader"/> and <see cref="FileStream"/>.
/// </summary>
private const int DefaultBufferSize = 4096;

/// <summary>
/// Indicates that
/// 1. The file is to be used for asynchronous reading.
/// 2. The file is to be accessed sequentially from beginning to end.
/// </summary>
private const FileOptions DefaultOptions = FileOptions.Asynchronous | FileOptions.SequentialScan;

public static Task<string[]> ReadAllLinesAsync(string path)
{
return ReadAllLinesAsync(path, Encoding.UTF8);
}

public static async Task<string[]> ReadAllLinesAsync(string path, Encoding encoding)
{
var lines = new List<string>();

// Open the FileStream with the same FileMode, FileAccess
// and FileShare as a call to File.OpenText would've done.
using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, DefaultBufferSize, DefaultOptions))
using (var reader = new StreamReader(stream, encoding))
{
string line;
while ((line = await reader.ReadLineAsync()) != null)
{
lines.Add(line);
}
}

return lines.ToArray();
}
}

How to verify that async has no effect in a specific case?

I agree with all the comments; it's not about what you do with the result and when, it's about what the thread that was executing your code is allowed to go off and do elsewise while the Async operation is working out. If the Stuff is a complex view in the DB based on a query that takes 5 minutes to run then Any will block your thread for 5 minutes. AnyAsync could let that thread serve tens of thousands of requests to your webserver in that time. If you've blocked one thread the webserver will have to spin up another to serve the other people and threads are expensive.

Async isn't about "better performance" in the sense of "make it async and it runs faster" - the code executes at the same rate. Async is about "better use of resources" - you need fewer threads and they're more busy/less sitting around doing nothing waiting for e.g IO to complete

If it were an office it's analogous to making a coffee while you're on hold on the phone; imagine you get put on hold to the gas company and your boss shouts saying he wants a coffee. If you're async you'll put it on speaker, get up while you're on hold and make the coffee, waiting to be called back by the sound of the hold music stopping and the gas company saying "hello". If you're sync you'll sit there ignoring the boss' request while someone else makes the coffee (which means the boss has to employ someone else). It's more expensive to have you sitting around doing nothing just waiting, and have to hire someone else, than have you reach a point with job x and then go do something else. If you're async you'll go and refill the printer while you're waiting for the kettle to boil. If you're sync on hold and the office junior is sync waiting for the kettle to boil, the boss will have to employ yet another person to fill the printer..

Whether it's you or someone else that picks up the call to the gas company when they finally take you off hold depends on whether you're done making the coffee and available and/or whether you've ConfigureAwait'd to indicate it has to be you that picks up the call (true) or whether anyone in the office can continue it (false)

comments: I'm comparing it to using IEnumerable immediately followed by e.g. Count(), which will iterate through the whole shabang anyway. In that case, we may go T[] right away with no deteriorated performance. What's your thought on that?

It depends on what else you will do with the result. If you need to repeatedly ask your result for its length and random access it then sure, use ToArrayAsync to turn it into an array and then do all your work with it as locally cached data. Unless it's a query that is two terabytes big as a result /p>

If you literally only need the count once, then it doesn't make sense to spend all that memory allocating an array and getting its length; just do the CountAsync

Neither of these seem entirely relevant to the question of "Async or no?" - if your IEnumerable is coming over a slow network and is some huge slow query it still goes back to "let the thread go off and make busy doing something else so you don't have to spin up more threads". Note that "slow" here could mean even tens of milliseconds. We don't have to be talking minute ops to see a benefit from async

Very fast operations sure, you can do them sync to save on the minuscule cost of setting up the state machine but be certain of the tipping point between the cost of setting up the state machine so the thread can do something else versus making it wait amount of time; the machine costs very little. Faced with the choice, I'd generally choose async if available, especially if any IO is involved

how to prove/refute whether it matters.

You'll have to race the horses for every case; how quickly does the op complete sync, how long does it take to do the async state management. It'd probably be quite a wearisome to do for an entire codebase which is why I tend to proceed on an "if async is available and isn't just available for async's sake, then probably someone has reasoned that using async is sensible, so we should use it" basis. Async all the way up spreading through a codebase is perhaps a good thing if you use its presence in a library as an indicator that you should leverage it in your code (which then indicates to users of your code that they should..)



Related Topics



Leave a reply



Submit