Problems Using Foreach Parallelization

Problems using foreach parallelization

To start with, you could write your foreach code a bit more concise :

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
foreach(i=1:nSims, .combine=c) %dopar% {
mean(rnorm(n=size, mean=mu, sd=sigma))
}
}

This gives you a vector, no need to explicitly make it within the loop. Also no need to use cbind, as your result is every time just a single number. So .combine=c will do

The thing with foreach is that it creates quite a lot of overhead to communicate between the cores and get the results of the different cores fit together. A quick look at the profile shows this pretty clearly :

$by.self
self.time self.pct total.time total.pct
$ 5.46 41.30 5.46 41.30
$<- 0.76 5.75 0.76 5.75
.Call 0.76 5.75 0.76 5.75
...

More than 40% of the time it is busy selecting things. It also uses a lot of other functions for the whole operation. Actually, foreach is only advisable if you have relatively few rounds through very time consuming functions.

The other two solutions are built on a different technology, and do far less in R. On a sidenode, snow is actually initially developed to work on clusters more than on single workstations, like multicore is.

Parallel.ForEach loop is not working it skips some and double do others

Thanks to sedat-kapanoglu, I found the problem is really about thread safety. The solution was to change every List<T> to ConcurrentBag<T>.

For everyone, like me, The solution of "parallel not working with collections" is to change from System.Collections.Generic to System.Collections.Concurrent

Problem with Invoke to parallelize foreach

Your problem is that Invoke (and queueing a Task to the UI TaskScheduler) both require the UI thread to be processing its message loop. However, it is not. It is still waiting for the Parallel.ForEach loop to complete. This is why you see a deadlock.

If you want the Parallel.ForEach to run without blocking the UI thread, wrap it into a Task, as such:

private TaskScheduler ui;
private void button1_Click(object sender, EventArgs e)
{
ui = TaskScheduler.FromCurrentSynchronizationContext();
DateTime start = DateTime.Now;
Task.Factory.StartNew(pforeach)
.ContinueWith(task =>
{
task.Wait(); // Ensure errors are propogated to the UI thread.
Text = (DateTime.Now - start).ToString();
}, ui);
}

private void pforeach()
{
int[] intArray = new int[60];
int totalcount = intArray.Length;
object lck = new object();
System.Threading.Tasks.Parallel.ForEach<int, int>(intArray,
() => 0,
(x, loop, count) =>
{
int value = 0;
System.Threading.Thread.Sleep(100);
count++;
value = (int)(100f / (float)totalcount * (float)count);

Task.Factory.StartNew(
() => Set(value),
CancellationToken.None,
TaskCreationOptions.None,
ui).Wait();
return count;
},
(x) =>
{

});
}

private void Set(int i)
{
progressBar1.Value = i;
}

Parallel.ForEach MaxDegreeOfParallelism Strange Behavior with Increasing Chunking

You can't use Parallel methods with async delegates - at least, not yet.

Since you already have a "pipeline" style of architecture, I recommend looking into TPL Dataflow. A single ActionBlock may be all that you need, and once you have that working, other blocks in TPL Dataflow may replace other parts of your pipeline.

If you prefer to stick with your existing buffer, then you should use asynchronous concurrency instead of Parallel:

private void Process() {
var throttler = new SemaphoreSlim(8);
var tasks = _buffer.GetConsumingEnumerable()
.Select(async report =>
{
await throttler.WaitAsync();
try {
await _handler.ProcessAsync(report).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}

_logger.LogError(e, "GPS Report Service");
}
finally {
throttler.Release();
}
})
.ToList();
await Task.WhenAll(tasks);
}

Should I always use Parallel.Foreach because more threads MUST speed up everything?

No, it doesn't make sense for every foreach. Some reasons:

  • Your code may not actually be parallelizable. For example, if you're using the "results so far" for the next iteration and the order is important)
  • If you're aggregating (e.g. summing values) then there are ways of using Parallel.ForEach for this, but you shouldn't just do it blindly
  • If your work will complete very fast anyway, there's no benefit, and it may well slow things down

Basically nothing in threading should be done blindly. Think about where it actually makes sense to parallelize. Oh, and measure the impact to make sure the benefit is worth the added complexity. (It will be harder for things like debugging.) TPL is great, but it's no free lunch.



Related Topics



Leave a reply



Submit