Why Is Parallel.Foreach Much Faster Then Asparallel().Forall() Even Though Msdn Suggests Otherwise

Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?

This problem is pretty debuggable, an uncommon luxury when you have problems with threads. Your basic tool here is the Debug > Windows > Threads debugger window. Shows you the active threads and gives you a peek at their stack trace. You'll easily see that, once it gets slow, that you'll have dozens of threads active that are all stuck. Their stack trace all look the same:

    mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x16 bytes  
    mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes 
    mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x182 bytes    
    mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x93 bytes   
    mscorlib.dll!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler scheduler, bool waitForCompletion) + 0xba bytes  
    mscorlib.dll!System.Threading.Tasks.Task.RunSynchronously(System.Threading.Tasks.TaskScheduler scheduler) + 0x13 bytes  
    System.Core.dll!System.Linq.Parallel.SpoolingTask.SpoolForAll<ConsoleApplication1.DirWithSubDirs,int>(System.Linq.Parallel.QueryTaskGroupState groupState, System.Linq.Parallel.PartitionedStream<ConsoleApplication1.DirWithSubDirs,int> partitions, System.Threading.Tasks.TaskScheduler taskScheduler) Line 172  C#
// etc..

Whenever you see something like this, you should immediately think fire-hose problem. Probably the third-most common bug with threads, after races and deadlocks.

Which you can reason out, now that you know the cause, the problem with the code is that every thread that completes adds N more threads. Where N is the average number of sub-directories in a directory. In effect, the number of threads grows exponentially, that's always bad. It will only stay in control if N = 1, that of course never happens on an typical disk.

Do beware that, like almost any threading problem, that this misbehavior tends to repeat poorly. The SSD in your machine tends to hide it. So does the RAM in your machine, the program might well complete quickly and trouble-free the second time you run it. Since you'll now read from the file system cache instead of the disk, very fast. Tinkering with ThreadPool.SetMinThreads() hides it as well, but it cannot fix it. It never fixes any problem, it only hides them. Because no matter what happens, the exponential number will always overwhelm the set minimum number of threads. You can only hope that it completes finishing iterating the drive before that happens. Idle hope for a user with a big drive.

The difference between ParallelEnumerable.ForAll() and Parallel.ForEach() is now perhaps also easily explained. You can tell from the stack trace that ForAll() does something naughty, the RunSynchronously() method blocks until all the threads are completed. Blocking is something threadpool threads should not do, it gums up the thread pool and won't allow it to schedule the processor for another job. And has the effect you observed, the thread pool is quickly overwhelmed with threads that are waiting on the N other threads to complete. Which isn't happening, they are waiting in the pool and are not getting scheduled because there are already so many of them active.

This is a deadlock scenario, a pretty common one, but the threadpool manager has a workaround for it. It watches the active threadpool threads and steps in when they don't complete in a timely manner. It then allows an extra thread to start, one more than the minimum set by SetMinThreads(). But not more then the maximum set by SetMaxThreads(), having too many active tp threads is risky and likely to trigger OOM. This does solve the deadlock, it gets one of the ForAll() calls to complete. But this happens at a very slow rate, the threadpool only does this twice a second. You'll run out of patience before it catches up.

Parallel.ForEach() doesn't have this problem, it doesn't block so doesn't gum up the pool.

Seems to be the solution, but do keep in mind that your program is still fire-hosing the memory of your machine, adding ever more waiting tp threads to the pool. This can crash your program as well, it just isn't as likely because you have a lot of memory and the threadpool doesn't use a lot of it to keep track of a request. Some programmers however accomplish that as well.

The solution is a very simple one, just don't use threading. It is harmful, there is no concurrency when you have only one disk. And it does not like being commandeered by multiple threads. Especially bad on a spindle drive, head seeks are very, very slow. SSDs do it a lot better, it however still takes an easy 50 microseconds, overhead that you just don't want or need. The ideal number of threads to access a disk that you can't otherwise expect to be cached well is always one.

Parallelizing a task using .AsParallel().ForAll or Parallel.ForEach performance issue

You are wasting threads and CPU time. In this case you would have 12 threads; each thread would process only one url in a time. So, you will process only 12 urls in a time. Moreover, most of the time these threads would do nothing (they would just wait for a free proxy or for a loaded page) while they could be used for more useful tasks.

To avoid this, you should use non-blocking IO operations. So, instead of using htmlDocumentLoader.LoadDocument you should consider to use one of its asynchronous interface (htmlDocumentLoader.BeginLoadDocument / htmlDocumentLoader.EndLoadDocument or htmlDocumentLoader.LoadDocumentAsync / htmlDocumentLoader.LoadDocumentCompleted). In this case, if you have 100 urls, all of them will be loaded simultaneously without creating extra threads and wasting CPU time. Only when page is loaded, the new thread will be created (actually took from ThreadPool) to handle it.

The way you wait for a free proxy is wasteful too. Instead of using while (ProxyList.Count == 0) which freezes the thread in case if no free proxy, consider to use timer which would wake up each second and check whether free proxy is available. It is not the best solution, but at least it would not waste threads. The better solution is to add an event to ProxyHandler which would notify when proxy is available.

Parallel.Invoke vs Parallel.Foreach for running parallel process on large list

The first option is a form of task-parallelism, in which you divide your task into group of sub-tasks and execute them in parallel. As is obvious from the code you provided, you are responsible for choosing the level of granularity [chunks] while creating the sub-tasks. The selected granularity might be too big or too low, if one does not rely on appropriate heuristics, and the resulting performance gain might not be significant. Task-parallelism is used in scenarios where the operation to be performed takes similar time for all input values.

The second option is a form of data-parallelism, in which the input data is divided into smaller chunks based on the number of hardware threads/cores/processors available, and then each individual chunk is processed in isolation. In this case, the .NET library chooses the right level of granularity for you and ensures better CPU utilization. Conventionally, data-parallelism is used in scenarios when the operation to be performed can vary in terms of time taken, depending on the input value.

In conclusion, if your operation is more or less uniform over the range of input values and you know the right granularity [chunk size], go ahead with the first option. If however that's not the case or if you are unsure about the above questions, go with the second option which usually pans out better in most scenarios.

NOTE: If this is a very performance critical component of your application, I will advise bench-marking the performances in production like environment with both approaches to get more data, in addition to the above recommendations.

AsParallel () and Any()?

AsParallel() returns a ParallelQuery, so if you call AsParallel().Any(...) you're not calling Enumerable.Any, but ParallelEnumerable.Any.

The reference source code for ParallelEnumerable.Any is here.

When you dig e.g. into the AnyAllSearchOperatorEnumerator class, you see that a flag called resultFoundFlag is used to tell other workers that a result is found so they can stop searching.

Is AsParallel().ForAll reliable

Instead of adding stuff to a collection in a parallel loop, it would probably be better to let the framework handle building the collection.

Example:

var processedCollection = collection
    .Where(h => h.Id == id)
    .AsParallel()
    .Select(h => ProcessItem(h))
    .ToArray();

In ProcessItem the processing would occur. Doing that in parallel seems unnecessary.

Inconsistency in Parallel.ForEach

One issue i see here is that you are modifying variables that are not declared inside the Parallel.ForEach code block, in which case you should lock them.

Parallel.ForEach(indexes, (index) =>
 {

 //parentCreation with index object
lock (lockerObject)
{
     parent=create(index);

    //call of function to create children
    createChildrenOfType1(parent);
    createChildrenOfType2(parent);
}
 });

By using lock, it ensures the threads sync properly.

Why Is Parallel.Foreach Much Faster Then Asparallel().Forall() Even Though Msdn Suggests Otherwise