Does Using Tasks (Tpl) Library Make an Application Multithreaded

Does using Tasks (TPL) library make an application multithreaded?

Tasks can be used to represent operations taking place on multiple threads, but they don't have to. One can write complex TPL applications that only ever execute in a single thread. When you have a task that, for example, represents a network request for some data, that task is not going to create additional threads to accomplish that goal. Such a program is (hopefully) asynchronous, but not necessarily mutlithreaded.

Parallelism is doing more than one thing at the same time. This may or may not be the result of multiple threads.

Let's go with an analogy here.


Here is how Bob cooks dinner:

  1. He fills a pot of water, and boils it.
  2. He then puts pasta in the water.
  3. He drains the pasta when its done.
  4. He prepares the ingredients for his sauce.
  5. He puts all of the ingredients for his sauce in a saucepan.
  6. He cooks his sauce.
  7. He puts his sauce on his pasta.
  8. He eats dinner.

Bob has cooked entirely synchronously with no multithreading, asynchrony, or parallelism when cooking his dinner.


Here is how Jane cooks dinner:

  1. She fills a pot of water and starts boiling it.
  2. She prepares the ingredients for her sauce.
  3. She puts the pasta in the boiling water.
  4. She puts the ingredients in the saucepan.
  5. She drains her pasta.
  6. She puts the sauce on her pasta.
  7. She eats her dinner.

Jane leveraged asynchronous cooking (without any multithreading) to achieve parallelism when cooking her dinner.


Here is how Servy cooks dinner:

  1. He tells Bob to boil a pot of water, put in the pasta when ready, and serve the pasta.
  2. He tells Jane to prepare the ingredients for the sauce, cook it, and then serve it over the pasta when done.
  3. He waits for Bob and Jane to finish.
  4. He eats his dinner.

Servy leveraged multiple threads (workers) who each individually did their work synchronously, but who worked asynchronously with respect to each other to achieve parallelism.

Of course, this becomes all the more interesting if we consider, for example, whether our stove has two burners or just one. If our stove has two burners then our two threads, Bob and Jane, are both able to do their work without getting in each others way, much. They might bump shoulders a bit, or each try to grab something from the same cabinet every now and then, so they'll each be slowed down a bit, but not much. If they each need to share a single stove burner though then they won't actually be able to get much done at all whenever the other person is doing work. In that case, the work won't actually get done any faster than just having one person doing the cooking entirely synchronously, like Bob does when he's on his own. In this case we are cooking with multiple threads, but our cooking isn't parallelized. Not all multithreaded work is actually parallel work. This is what happens when you are running multiple threads on a machine with one CPU. You don't actually get work done any faster than just using one thread, because each thread is just taking turns doing work. (That doesn't mean multithreaded programs are pointless on one cores CPUs, they're not, it's just that the reason for using them isn't to improve speed.)


We can even consider how these cooks would do their work using the Task Parallel Library, to see what uses of the TPL correspond to each of these types of cooks:

So first we have bob, just writing normal non-TPL code and doing everything synchronously:

public class Bob : ICook
{
public IMeal Cook()
{
Pasta pasta = PastaCookingOperations.MakePasta();
Sauce sauce = PastaCookingOperations.MakeSauce();
return PastaCookingOperations.Combine(pasta, sauce);
}
}

Then we have Jane, who starts two different asynchronous operations, then waits for both of them after starting each of them to compute her result.

public class Jane : ICook
{
public IMeal Cook()
{
Task<Pasta> pastaTask = PastaCookingOperations.MakePastaAsync();
Task<Sauce> sauceTask = PastaCookingOperations.MakeSauceAsync();
return PastaCookingOperations.Combine(pastaTask.Result, sauceTask.Result);
}
}

As a reminder here, Jane is using the TPL, and she's doing much of her work in parallel, but she's only using a single thread to do her work.

Then we have Servy, who uses Task.Run to create a task that represents doing work in another thread. He starts two different workers, has them each both synchronously do some work, and then waits for both workers to finish.

public class Servy : ICook
{
public IMeal Cook()
{
var bobsWork = Task.Run(() => PastaCookingOperations.MakePasta());
var janesWork = Task.Run(() => PastaCookingOperations.MakeSauce());
return PastaCookingOperations.Combine(bobsWork.Result, janesWork.Result);
}
}

Multithreading or task parallel library

I believe TPL will usually use one thread per core unless you specifically tell it to use more. It's possible that it will detect when that's not enough - e.g. in your case, where your tasks are going to spend most of their time waiting for data.

Is there any reason you can't use asynchronous web fetching? I suspect there's no need to have a thread per task or even a thread per core here. TPL makes various aspects of asynchronous programming easier, with things like continuations.

In terms of efficiency, is your application actually CPU bound? It sounds like you need to be getting the maximum appropriate level of parallelism at the network side - that's the bit to concentrate on, unless the calculations are really heavyweight.



UPDATES - NOT FROM ORIGINAL AUTHOR

The answer above is great as always but could be misleading as it does not have some important changes in .NET 4.0 CLR.

As Andras says, current TPL implementation uses the thread pool hence will use as many threads as required (number of cores is irrelevant now):

The Task Parallel Library (TPL) is a collection of new classes
specifically designed to make it easier and more efficient to execute
very fine-grained parallel workloads on modern hardware. TPL has been
available separately as a CTP for some time now, and was included in
the Visual Studio 2010 CTP, but in those releases it was built on its
own dedicated work scheduler. For Beta 1 of CLR 4.0, the default
scheduler for TPL will be the CLR thread pool, which allows TPL-style
workloads to “play nice” with existing, QUWI-based code, and allows us
to reuse much of the underlying technology in the thread pool - in
particular, the thread-injection algorithm, which we will discuss in a
future post.

From:

Link

Is there any point to using Task Parallel Library

Unless your application is actively taking advantage of parallel processing, neither the OS nor the CPU will do this for you automatically. The OS and CPU may switch execution of your application between multiple cores, but that does not make it execute simultaneously on the different cores. For that you need to make your application capable of executing at least parts in parallel.

According to MSDN Parallel Processing and Concurrency in the .NET Framework there are basically three ways to do parallel processing in .NET:

  1. Managed threading where you handle the threads and their synchronization yourself.
  2. Various asynchronous programming patterns.
  3. Parallel Programming in the .NET Framework of which both the Task Parallel Library and PLINQ are a part.

Reasons for using the TPL include that it and the accompanying tools according to the MSDN article

simplify parallel development so that you can write efficient, fine-grained, and scalable parallel code in a natural idiom without having to work directly with threads or the thread pool.

Threads vs. Tasks has some help for deciding between threads and the TPL
with the conclusion:

The bottom line is that Task is almost always the best option; it provides a much more powerful API and avoids wasting OS threads.

The only reasons to explicitly create your own Threads in modern code are setting per-thread options, or maintaining a persistent thread that needs to maintain its own identity.

What is so great about TPL

Although you could do everything equivalently in TPL or threadpool, for a better abstraction, and scalability patterns TPL is preferred over Threadpool. But it is upto the programmer, and if you know exactly what you are doing, and based on your scheduling and synchronization requirements play out in your specific application you could use Threadpool more effectively. There are some stuff you get free with TPL which you've got to code up when using Threadpool, like following few I can think of now.

  • work stealing
  • worker thread local pool
  • scheduling groups of actions like Parallel.For

TPL Tasks, Threads, etc

The ideal state of a system is to have 1 actively running thread per CPU core. By defining work in more general terms of "tasks", the TPL can dynamically decide how many threads to use and which tasks to do on each one in order to come closest to achieving that ideal state. These are decisions that are almost always best made dynamically at runtime because when writing the code you can't know for sure how many CPU cores will be available to your application, how busy they are with other work, etc.

Using task parallel library (TPL) for polling

In your sample, you started a new asynchronous task, while at the same time your application continues its execution to the end of the Main method and suddenly exits before your new task even has a chance of executing its content (in your case, the while loop).

You need to wait for your task to complete (or in your case, execute until you kill it). Try structuring your code like this:

static void Main(string[] args)
{
//// Your initialization code

if (devfound)
{
//// Device found, prepare for task
var t1 = Task.Factory.StartNew(() =>
{
//// Task body
});

t1.Wait();
}
}

C# TPL Tasks - How many at one time

This seems like a problem better suited for worker threads (separate thread for each file) managed with the ThreadPool rather than the TPL. TPL is great when you can divide and conquer on a single item of data but your zip files are treated individually.

Disc I/O is going to be your bottle neck so I think that you will need to throttle the number of jobs running simultaneously. It's simple to manage this with worker threads but I'm not sure how much control you have (if nay) over the parallel for, foreach as far as how parallelism goes on at once, which could choke your process and actually slow it down.



Related Topics



Leave a reply



Submit