Parallel.ForEach vs Task.Run and Task.WhenAll
In this case, the second method will asynchronously wait for the tasks to complete instead of blocking.
However, there is a disadvantage to use Task.Run
in a loop- With Parallel.ForEach
, there is a Partitioner
which gets created to avoid making more tasks than necessary. Task.Run
will always make a single task per item (since you're doing this), but the Parallel
class batches work so you create fewer tasks than total work items. This can provide significantly better overall performance, especially if the loop body has a small amount of work per item.
If this is the case, you can combine both options by writing:
await Task.Run(() => Parallel.ForEach(strings, s =>
{
DoSomething(s);
}));
Note that this can also be written in this shorter form:
await Task.Run(() => Parallel.ForEach(strings, DoSomething));
Parallel.ForEach or Task.WhenAll when involving async operations?
I don't think the main consideration here is performance. (It always is :-) but read on - using the correct construct in the correct case will guarantee you the best performance)
Think of Parallel.ForEach
as a special ForEach
which is parallelizing the individual (synchronous) tasks. While you could shove already asynchronous operations in it (by blocking), it seems contrived and misused - you will lose the async/await benefits of each tasks by doing so. The only "benefit" that you get out of it is that its behavior from the stand point of view of your code flow is synchronous - it will not complete until all threads it spawned return.
Since your individual tasks are already async, it is the latest feature of the Parallel.ForEach
that Task.WhenAll
gives you.
Parallel.ForEach vs Task.Factory.StartNew
The first is a much better option.
Parallel.ForEach, internally, uses a Partitioner<T>
to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.
The second option will schedule a single Task
per item in your collection. While the results will be (nearly) the same, this will introduce far more overhead than necessary, especially for large collections, and cause the overall runtimes to be slower.
FYI - The Partitioner used can be controlled by using the appropriate overloads to Parallel.ForEach, if so desired. For details, see Custom Partitioners on MSDN.
The main difference, at runtime, is the second will act asynchronous. This can be duplicated using Parallel.ForEach by doing:
Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));
By doing this, you still take advantage of the partitioners, but don't block until the operation is complete.
C# Parallel.ForEach and Task.WhenAll sometimes returning less values then supposed
And is there a way to guarantee returning always all tasks?
Several people in the comments are pointing out you should just do this, on the assumption that numbers
is a non-threadsafe List:
foreach(var number in numbers)
{
var value = Regex.Replace(number, @"\s+", "%20");
tasks.Add(client.GetAsync(url + value));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
foreach (var task in tasks)
{
...
}
There doesn't seem to be any considerable benefit in parallelizing the creation of the tasks that do the download; this happens very quickly. The waiting for the downloads to complete is done in the WhenAll
ps; there are a variety of more involved ways to escaping data for a URL, but if you're specifically looking to convert any kind of whitespace to %20, I guess it makes sense to do it with regex..
Edit; you asked when to use a Parallel ForEach, and I'm going to say "don't, generally, because you have to be more careful about th contexts within which you use it", but if you made the Parallel.ForEach do more syncronous work, it might make sense:
Parallel.ForEach(numbers, number =>
{
var value = Regex.Replace(number, @"\s+", "%20");
var r = client.Get(url + value));
//do something meaningful with r here, i.e. whatever ... is in your foreach (var task in tasks)
});
but be mindful if you're performing updates to some shared thing, for coordination purposes, from within the body then it'll need to be threadsafe
Parallel.ForEach faster than Task.WaitAll for I/O bound tasks?
A possible reason why Parallel.ForEach
may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach
is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
Best way to start several async tasks in parallel?
Parallel
is not an option because you have asynchronous actions.
Your options are:
- Start all of the tasks simultaneously, and then use
await Task.WhenAll
for them all to complete. You can useSemaphoreSlim
if you find you need to throttle the number of active tasks. - Use an
ActionBlock<T>
(from TPL Dataflow) to queue up the work individually. You can useExecutionDataflowBlockOptions.MaxDegreeOfParallelism
if you want to process more than one simultaneously.
The ActionBlock<T>
approach would be better if you don't know about all the tasks at the time they're started (i.e., if more can arrive while you're processing), or if other nearby parts of your code will fit into a "pipeline" kind of design.
Task.WhenAll
is nice because it doesn't require a separate library with its own design philosophy and learning curve.
Either Task.WhenAll
or ActionBlock<T>
would work well for your use case.
Related Topics
Why Firefox Requires Geckodriver
How to Access HTML Form Input from ASP.NET Code Behind
How to Pass Parameters by Reference in Java
Opening New Window in Mvvm Wpf
Accessing Password Protected Network Drives in Windows in C#
Metadata File '.Dll' Could Not Be Found
How to Make the Cursor Turn to the Wait Cursor
Format of the Initialization String Does Not Conform to Specification Starting at Index 0
What Is the Minimum Client Footprint Required to Connect C# to an Oracle Database
Windows Application Startup Error Exception Code: 0Xe0434352
Kanji Characters from Webclient HTML Different from Actual Kanji in Website
Creating a Zip Archive in Memory Using System.Io.Compression
Row_Number Over (Partition by Xxx) in Linq
C# Pass by Value VS. Pass by Reference
How to Run a Simple Bit of Code in a New Thread