Parallel.Foreach VS Task.Run and Task.Whenall

Parallel.ForEach vs Task.Run and Task.WhenAll

In this case, the second method will asynchronously wait for the tasks to complete instead of blocking.

However, there is a disadvantage to use Task.Run in a loop- With Parallel.ForEach, there is a Partitioner which gets created to avoid making more tasks than necessary. Task.Run will always make a single task per item (since you're doing this), but the Parallel class batches work so you create fewer tasks than total work items. This can provide significantly better overall performance, especially if the loop body has a small amount of work per item.

If this is the case, you can combine both options by writing:

await Task.Run(() => Parallel.ForEach(strings, s =>
{
DoSomething(s);
}));

Note that this can also be written in this shorter form:

await Task.Run(() => Parallel.ForEach(strings, DoSomething));

Parallel.ForEach or Task.WhenAll when involving async operations?

I don't think the main consideration here is performance. (It always is :-) but read on - using the correct construct in the correct case will guarantee you the best performance)

Think of Parallel.ForEach as a special ForEach which is parallelizing the individual (synchronous) tasks. While you could shove already asynchronous operations in it (by blocking), it seems contrived and misused - you will lose the async/await benefits of each tasks by doing so. The only "benefit" that you get out of it is that its behavior from the stand point of view of your code flow is synchronous - it will not complete until all threads it spawned return.

Since your individual tasks are already async, it is the latest feature of the Parallel.ForEach that Task.WhenAll gives you.

Parallel.ForEach vs Task.Factory.StartNew

The first is a much better option.

Parallel.ForEach, internally, uses a Partitioner<T> to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.

The second option will schedule a single Task per item in your collection. While the results will be (nearly) the same, this will introduce far more overhead than necessary, especially for large collections, and cause the overall runtimes to be slower.

FYI - The Partitioner used can be controlled by using the appropriate overloads to Parallel.ForEach, if so desired. For details, see Custom Partitioners on MSDN.

The main difference, at runtime, is the second will act asynchronous. This can be duplicated using Parallel.ForEach by doing:

Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));

By doing this, you still take advantage of the partitioners, but don't block until the operation is complete.

C# Parallel.ForEach and Task.WhenAll sometimes returning less values then supposed

And is there a way to guarantee returning always all tasks?

Several people in the comments are pointing out you should just do this, on the assumption that numbers is a non-threadsafe List:

    foreach(var number in numbers)
{
var value = Regex.Replace(number, @"\s+", "%20");

tasks.Add(client.GetAsync(url + value));
}

await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var task in tasks)
{
...
}

There doesn't seem to be any considerable benefit in parallelizing the creation of the tasks that do the download; this happens very quickly. The waiting for the downloads to complete is done in the WhenAll

ps; there are a variety of more involved ways to escaping data for a URL, but if you're specifically looking to convert any kind of whitespace to %20, I guess it makes sense to do it with regex..

Edit; you asked when to use a Parallel ForEach, and I'm going to say "don't, generally, because you have to be more careful about th contexts within which you use it", but if you made the Parallel.ForEach do more syncronous work, it might make sense:

    Parallel.ForEach(numbers, number =>
{
var value = Regex.Replace(number, @"\s+", "%20");

var r = client.Get(url + value));

//do something meaningful with r here, i.e. whatever ... is in your foreach (var task in tasks)

});

but be mindful if you're performing updates to some shared thing, for coordination purposes, from within the body then it'll need to be threadsafe

Parallel.ForEach faster than Task.WaitAll for I/O bound tasks?

A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.

Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?

Best way to start several async tasks in parallel?

Parallel is not an option because you have asynchronous actions.

Your options are:

  • Start all of the tasks simultaneously, and then use await Task.WhenAll for them all to complete. You can use SemaphoreSlim if you find you need to throttle the number of active tasks.
  • Use an ActionBlock<T> (from TPL Dataflow) to queue up the work individually. You can use ExecutionDataflowBlockOptions.MaxDegreeOfParallelism if you want to process more than one simultaneously.

The ActionBlock<T> approach would be better if you don't know about all the tasks at the time they're started (i.e., if more can arrive while you're processing), or if other nearby parts of your code will fit into a "pipeline" kind of design.

Task.WhenAll is nice because it doesn't require a separate library with its own design philosophy and learning curve.

Either Task.WhenAll or ActionBlock<T> would work well for your use case.



Related Topics



Leave a reply



Submit