C# - Threadpool VS Tasks

C# - ThreadPool vs Tasks

The objective of the Tasks namespace is to provide a pluggable architecture to make multi-tasking applications easier to write and more flexible.

The implementation uses a TaskScheduler object to control the handling of tasks. This has virtual methods that you can override to create your own task handling. Methods include for instance

protected virtual void QueueTask(Task task)
public virtual int MaximumConcurrencyLevel

There will be a tiny overhead to using the default implementation as there's a wrapper around the .NET threads implementation, but I'd not expect it to be huge.

There is a (draft) implementation of a custom TaskScheduler that implements multiple tasks on a single thread here.

Tasks vs ThreadPool

Please take a look at ParallelOptions.MaxDegreeOfParallelism for Tasks.

I would advise you to use Tasks, because they provide a higher level abstraction than the ThreadPool.

A very good read on the topic can be found here. Really, a must-have book and it's free on top of that :)

CPU benchmark test: Tasks vs ThreadPool vs Thread

some form of "counter" to the argument for async/await/Tasks.

The posted code has absolutely nothing to do with async or await. It's comparing three different kinds of parallelism:

  1. Dynamic Task Parallelism.
  2. Direct threadpool access.
  3. Manual multithreading with manual partitioning.

The first two are somewhat comparable. Of course, direct threadpool access will be faster than Dynamic Task Parallelism. But what these tests don't show is that direct threadpool access is much harder to do correctly. In particular, when you are running real-world code and need to handle exceptions and return values, you have to add in boilerplate code and object instances to the direct threadpool access code that slows it down.

The third one is not comparable at all. It just uses 10 manual threads. Again, this example ignores the additional complexity necessary in real-world code; specifically, the need to handle exceptions and return values. It also assumes a partition size, which is problematic; real-world code does not have that luxury. If you're managing your own set of threads, then you have to decide things like how quickly you should increase the number of threads when the queue has many items, and how quickly you should end threads when the queue is empty. These are all difficult questions that add lots of code to the #3 test before you're really comparing the same thing.

And that's not even to say anything about the cost of maintenance. In my experience (i.e., as an application developer), micro-optimizations are just not worth it. Even if you took the "worst" (#1) approach, you're losing about 7 microseconds per item. That is an unimaginably small amount of savings. As a general rule, developer time is far more valuable to your company than user time. If your users have to process a hundred thousand items, the difference would barely be perceptible. If you were to adopt the "best" (#3) approach, the code would be much less maintainable, particularly considering the boilerplate and thread management code necessary in production code and not shown here. Going with #3 would probably cost your company far more in terms of developer time just writing or reading the code than it would ever save in terms of user time.

Oh, and the funniest part of all this is that with all these different kinds of parallelism compared, they didn't even include the one that is most suitable for this test: PLINQ.

static void Main(string[] args)
{
TaskParallelLibrary();
ManualThreads();
Console.ReadKey();
}

static void ManualThreads()
{
var queue = new List<string>();
for (int i = 0; i != 1000000; ++i)
queue.Add("string" + i);
var resultList = new List<string>();
var stopwatch = Stopwatch.StartNew();
var counter = 0;
for (int i = 0; i != 10; ++i)
{
new Thread(() =>
{
while (true)
{
var t = "";
lock (queue)
{
if (counter >= queue.Count)
break;
t = queue[counter];
++counter;
}
t = t.Substring(0, 5);
string t2 = t.Substring(0, 2) + t;
lock (resultList)
resultList.Add(t2);
}
}).Start();
}
while (resultList.Count < queue.Count)
Thread.Sleep(1);
stopwatch.Stop();
Console.WriteLine($"Manual threads: Processed {resultList.Count} in {stopwatch.Elapsed}");
}

static void TaskParallelLibrary()
{
var queue = new List<string>();
for (int i = 0; i != 1000000; ++i)
queue.Add("string" + i);
var stopwatch = Stopwatch.StartNew();
var resultList = queue.AsParallel().Select(t =>
{
t = t.Substring(0, 5);
return t.Substring(0, 2) + t;
}).ToList();
stopwatch.Stop();
Console.WriteLine($"Parallel: Processed {resultList.Count} in {stopwatch.Elapsed}");
}

On my machine, after running this code several times, I find that the PLINQ code outperforms the Manual Threads by about 30%. Sample output on .NET Core 3.0 preview5-27626-15, built for Release, run standalone:

Parallel: Processed 1000000 in 00:00:00.3629408
Manual threads: Processed 1000000 in 00:00:00.5119985

And, of course, the PLINQ code is:

  • Shorter
  • More maintainable
  • More robust (handles exceptions and return types)
  • Less awkward (no need to poll for completion)
  • More portable (partitions based on number of processors)
  • More flexible (automatically adjusts the thread pool as necessary based on amount of work)

ThreadPool Vs Task Vs Async

It seems a reasonable question but the context makes it hard to give a good answer.

You are using a Console program and Dont care what SendEmail returns. That is not the normal case.

async/await uses Tasks that run on top of the ThreadPool. so your 'vs' doesn't hold up. And normally you would at least care about errors that occurred.

When you really don't care about errors or results, QueueUserWorkItem() is the most basic approach.

In most contexts however you would aim for an awaitable Task. The SmtpClient.SendAsync() is not awaitable, so a Task that runs the synchronous Send() seems most appropriate.

And when it is really about sending (bulk) mails you would have a few other issues to tackle, like throttling the number of parallel calls.

Would a ThreadPool or a Task be the correct thing to use for a server?

By default, the TPL will use the Thread Pool. So, either way you are using the Thread Pool. The question is just which programming model you use to access the pool. I strongly suggest TPL, as it provides a superior programming abstraction.

The threads in your example are actually not spinning (burning CPU cycles), but rather blocking on a wait handle. That is quite efficient and does not consume a thread while blocked.

UPDATE

The TaskFactory.FromAsync(...).ContinueWith(...) pattern is appropriate. For a great list of reasons, see this question.

If you are using C# 5 / .NET 4.5, you can use async/await to express your code pattern even more compactly.

http://mtaulty.com/CommunityServer/blogs/mike_taultys_blog/archive/2010/11/22/c-5-0-rise-of-the-task.aspx

ThreadPool.QueueUserWorkItem vs Task.Factory.StartNew

If you're going to start a long-running task with TPL, you should specify TaskCreationOptions.LongRunning, which will mean it doesn't schedule it on the thread-pool. (EDIT: As noted in comments, this is a scheduler-specific decision, and isn't a hard and fast guarantee, but I'd hope that any sensible production scheduler would avoid scheduling long-running tasks on a thread pool.)

You definitely shouldn't schedule a large number of long-running tasks on the thread pool yourself. I believe that these days the default size of the thread pool is pretty large (because it's often abused in this way) but fundamentally it shouldn't be used like this.

The point of the thread pool is to avoid short tasks taking a large hit from creating a new thread, compared with the time they're actually running. If the task will be running for a long time, the impact of creating a new thread will be relatively small anyway - and you don't want to end up potentially running out of thread pool threads. (It's less likely now, but I did experience it on earlier versions of .NET.)

Personally if I had the option, I'd definitely use TPL on the grounds that the Task API is pretty nice - but do remember to tell TPL that you expect the task to run for a long time.

EDIT: As noted in comments, see also the PFX team's blog post on choosing between the TPL and the thread pool:

In conclusion, I’ll reiterate what the CLR team’s ThreadPool developer has already stated:

Task is now the preferred way to queue work to the thread pool.

EDIT: Also from comments, don't forget that TPL allows you to use custom schedulers, if you really want to...

Task vs Thread differences

Thread is a lower-level concept: if you're directly starting a thread, you know it will be a separate thread, rather than executing on the thread pool etc.

Task is more than just an abstraction of "where to run some code" though - it's really just "the promise of a result in the future". So as some different examples:

  • Task.Delay doesn't need any actual CPU time; it's just like setting a timer to go off in the future
  • A task returned by WebClient.DownloadStringTaskAsync won't take much CPU time locally; it's representing a result which is likely to spend most of its time in network latency or remote work (at the web server)
  • A task returned by Task.Run() really is saying "I want you to execute this code separately"; the exact thread on which that code executes depends on a number of factors.

Note that the Task<T> abstraction is pivotal to the async support in C# 5.

In general, I'd recommend that you use the higher level abstraction wherever you can: in modern C# code you should rarely need to explicitly start your own thread.

Is it fine to use tasks and thread-pool together?

No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.

The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.

Since your usage is divided down module lines, you don't have much to worry about.



Related Topics



Leave a reply



Submit