Number of Threads Created by Gcd

Number of threads created by GCD?

First, 66 == 64 (the maximum GCD thread pool size) + the main thread + some other random non-GCD thread.

Second, GCD is not magic. It is optimized for keeping the CPU busy with code that is mostly CPU bound. The "magic" of GCD is that it dynamically create more threads than CPUs when work items unintentionally and briefly wait for operations to complete.

Having said that, code can confuse the GCD scheduler by intentionally sleeping or waiting for events instead of using dispatch sources to wait for events. In these scenarios, the block of work is effectively implementing its own scheduler and therefore GCD must assume that the thread has been co-opted from the thread pool.

In short, the thread pool will operate optimally if your code prefers dispatch_after() over "sleep()" like APIs, and dispatch sources over handcrafted event loops (Unix select()/poll(), Cocoa runloops, or POSIX condition variables).

Is there a way to limit the number of threads spawned by GCD in my application?

From my experience and work with GCD under various circumstances, I believe this is not possible.

Said that, it is very important to understand, that by using GCD, you spawn queues, not threads. Whenever a call to create a queue is made from your code, GCD subsystem in its turn checks OS condition and seeks for available resources. New threads are then created under the hood based on these conditions – in the order and with the resources allocated, not controlled by you. This is clearly explained in official documentation:

When it comes to adding concurrency to an application, dispatch queues
provide several advantages over threads. The most direct advantage is
the simplicity of the work-queue programming model. With threads, you
have to write code both for the work you want to perform and for the
creation and management of the threads themselves. Dispatch queues let
you focus on the work you actually want to perform without having to
worry about the thread creation and management. Instead, the system
handles all of the thread creation and management for you. The
advantage is that the system is able to manage threads much more
efficiently than any single application ever could. The system can
scale the number of threads dynamically based on the available
resources and current system conditions. In addition, the system is
usually able to start running your task more quickly than you could if
you created the thread yourself.

_{Source: Dispatch Queues}

There is no way you can control resources consumption with GCD, like by setting some kind of threshold. GCD is a high-level abstraction over low-level things, such as threads, and it manages it for you.

The only way you can possibly influence how many resources particular task within your application should take, is by setting its QoS (Quality of Service) class (formerly known simply as priority, extended to a more complex concept). To be brief, you can classify tasks within your application based on their importance, this way helping GCD and your application be more resource- and battery- efficient. Its employment is highly encouraged in complex applications with vast concurrency usage.
Even still, however, this kind of regulation from developer end has its limits and ultimately does not address the goal to control threads creation:

Apps and operations compete to use finite resources—CPU, memory,
network interfaces, and so on. In order to remain responsive and
efficient, the system needs to prioritize tasks and make intelligent
decisions about when to execute them.

Work that directly impacts the user, such as UI updates, is extremely
important and takes precedence over other work that may be occurring
in the background. This higher priority work often uses more energy,
as it may require substantial and immediate access to system
resources.

As a developer, you can help the system prioritize more effectively by
categorizing your app’s work, based on importance. Even if you’ve
implemented other efficiency measures, such as deferring work until an
optimal time, the system still needs to perform some level of
prioritization. Therefore, it is still important to categorize the
work your app performs.

_{Source: Prioritize Work with Quality of Service Classes}

To conclude, if you are deliberate in your intent to control threads, don't use GCD. Use low-level programming techniques and manage them yourself. If you use GCD, then you agree to leave this kind of responsibility to GCD.

How many threads should Grand Central Dispatch be creating?

One situation where GCD will increase the thread pool by adding more threads is I/O contention. If a dispatched block waits for filesystem or networking I/O, it doesn’t use the CPU, hence GCD thinks the CPU is idle and able to process more threads.

In fact, depending on the nature of the dispatched blocks, this can increase I/O contention further and reach the limit of 512 worker threads. Mike Ash has written a blog post about this situation.

GCD in iOS always creating new threads

The dispatch_get_global_queue doesn't necessarily create new threads. It will pull threads from a limited pool of "worker" threads that GCD manages for you. When it's done running your dispatched task, it will return this thread back to the pool of worker threads.

When you dispatch something to a GCD queue, it will grab an available worker thread from this pool. You have no assurances as to which one it uses from one invocation to the next. But you simply don't need to worry about whether it's a different thread, as GCD is managing this pool of threads to ensure that threads are not created and destroyed unnecessarily. It's one of the main reasons we use GCD instead of doing our own NSThread programming. It's a lot more efficient.

The only thing you need to worry about is the degree of concurrency that you employ in your app so that you don't exhaust this pool of worker threads (having unintended impact on other background tasks that might be drawing on the same pool of worker threads).

The most draconian way of limiting the degree of concurrency is to employ a shared serial queue that you create yourself. That means that only one thing will run on that serial queue at a time. (Note, even in this situation you don't have assurances that it will use the same thread every time; only that you'll only be using one background worker thread at a time.)

A slightly more refined way to constrain the degree of concurrency in your app is to use NSOperationQueue (a layer above GCD) and set its maxConcurrentOperationCount. With this, you can constrain the degree of concurrency to something greater than 1, but still small enough to not exhaust the worker threads. E.g. for network queues, it's not unusual to specify a maxConcurrentOperationCount of 4 or 5.

In your revised question, you show us a code snippet. So, a couple of thoughts:

Don't worry about what [NSThread currentThread]. GCD will manage the threads for you.

Is this stitching process slow and potentially using a fair degree of memory?

If so, I would not suggest either a serial queue (only allowing one at a time might be too constraining), nor a global queue (because you could have enough of these running concurrently that you'd use up all the available worker threads), nor a GCD concurrent queue (again, the degree of concurrency is unbound), but instead use an NSOperationQueue with some reasonable limited degree of concurrency:

@property (nonatomic, strong) NSOperationQueue *stitchQueue;

And

self.stitchQueue = [[NSOperationQueue alloc] init];
self.stitchQueue.name = @"com.domain.app.stitch";
self.stitchQueue.maxConcurrentOperationCount = 4;

And

- (void)sdImageWith:(NSString *)urlString saveIn:(NSString *)savePath completion:(completionSuccess)successCompletion failure:(completionFalse)failureCompletion {

    [[SDWebImageDownloader sharedDownloader] downloadImageWithURL:[NSURL URLWithString:urlString] options:SDWebImageDownloaderUseNSURLCache progress:nil completed:^(UIImage * _Nullable image, NSData * _Nullable data, NSError * _Nullable error, BOOL finished) {
        if (data.length <= 100 || error != nil) { failureCompletion(error); return;}

        [self.stitchQueue addOperationWithBlock:^{
            // NSLog(@"thread：%@", [NSThread currentThread]); // stop worrying about `NSThread`
            [[DLStitchingWarper  shareSingleton] StitchingImage:data savePath:savePath];
            if ([[NSFileManager defaultManager] fileExistsAtPath:savePath]) {
                successCompletion(savePath);
            }else {
                NSError *error = [[NSError alloc] initWithDomain:@"x" xxxcode:404 userInfo:nil];
                failureCompletion(error);
            }
        }];
    }];
}

If you prefer to use a custom GCD serial queue (with only one stitching operation possible at a time) or a custom GCD concurrent queue (with no limit as to how many stitching tasks running at any given time), feel free. You know how time consuming and/or resource intensive these operations are, so only you can make that call. But operation queues offer the benefits of concurrency, but simple control over the degree of concurrency.

Creating exactly N threads with Grand Central Dispatch (GCD)

You should look into NSThread. It's the way to go if you need fine grain control over exactly the number of threads you want to have running.

Workaround on the threads limit in Grand Central Dispatch?

Well, if you're bound and determined, you can free yourself of the shackles of GCD, and go forth and slam right up against the OS per-process thread limit using pthreads, but the bottom line is this: if you're hitting the queue-width limit in GCD, you might want to consider reevaluating your concurrency approach.

At the extremes, there are two ways you can hit the limit:

You can have 64 threads blocked on some OS primitive via a blocking syscall. (I/O bound)
You can legitimately have 64 runnable tasks all ready to rock at the same time. (CPU bound)

If you're in situation #1, then the recommended approach is to use non-blocking I/O. In fact, GCD has a whole bunch of calls, introduced in 10.7/Lion IIRC, that facilitate asynchronous scheduling of I/O and improve thread re-use. If you use the GCD I/O mechanism, then those threads won't be tied up waiting on I/O, GCD will just queue up your blocks (or functions) when data becomes available on your file descriptor (or mach port). See the documentation for dispatch_io_create and friends.

In case it helps, here's a little example (presented without warranty) of a TCP echo server implemented using the GCD I/O mechanism:

in_port_t port = 10000;
void DieWithError(char *errorMessage);

// Returns a block you can call later to shut down the server -- caller owns block.
dispatch_block_t CreateCleanupBlockForLaunchedServer()
{
    // Create the socket
    int servSock = -1;
    if ((servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
        DieWithError("socket() failed");
    }

    // Bind the socket - if the port we want is in use, increment until we find one that isn't
    struct sockaddr_in echoServAddr;
    memset(&echoServAddr, 0, sizeof(echoServAddr));
    echoServAddr.sin_family = AF_INET;
    echoServAddr.sin_addr.s_addr = htonl(INADDR_ANY);
    do {
        printf("server attempting to bind to port %d\n", (int)port);
        echoServAddr.sin_port = htons(port);
    } while (bind(servSock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr)) < 0 && ++port);

    // Make the socket non-blocking
    if (fcntl(servSock, F_SETFL, O_NONBLOCK) < 0) {
        shutdown(servSock, SHUT_RDWR);
        close(servSock);
        DieWithError("fcntl() failed");
    }

    // Set up the dispatch source that will alert us to new incoming connections
    dispatch_queue_t q = dispatch_queue_create("server_queue", DISPATCH_QUEUE_CONCURRENT);
    dispatch_source_t acceptSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, servSock, 0, q);
    dispatch_source_set_event_handler(acceptSource, ^{
        const unsigned long numPendingConnections = dispatch_source_get_data(acceptSource);
        for (unsigned long i = 0; i < numPendingConnections; i++) {
            int clntSock = -1;
            struct sockaddr_in echoClntAddr;
            unsigned int clntLen = sizeof(echoClntAddr);

            // Wait for a client to connect
            if ((clntSock = accept(servSock, (struct sockaddr *) &echoClntAddr, &clntLen)) >= 0)
            {
                printf("server sock: %d accepted\n", clntSock);

                dispatch_io_t channel = dispatch_io_create(DISPATCH_IO_STREAM, clntSock, q, ^(int error) {
                    if (error) {
                        fprintf(stderr, "Error: %s", strerror(error));
                    }
                    printf("server sock: %d closing\n", clntSock);
                    close(clntSock);
                });

                // Configure the channel...
                dispatch_io_set_low_water(channel, 1);
                dispatch_io_set_high_water(channel, SIZE_MAX);

                // Setup read handler
                dispatch_io_read(channel, 0, SIZE_MAX, q, ^(bool done, dispatch_data_t data, int error) {
                    BOOL close = NO;
                    if (error) {
                        fprintf(stderr, "Error: %s", strerror(error));
                        close = YES;
                    }

                    const size_t rxd = data ? dispatch_data_get_size(data) : 0;
                    if (rxd) {
                        // echo...
                        printf("server sock: %d received: %ld bytes\n", clntSock, (long)rxd);
                        // write it back out; echo!
                        dispatch_io_write(channel, 0, data, q, ^(bool done, dispatch_data_t data, int error) {});
                    }
                    else {
                        close = YES;
                    }

                    if (close) {
                        dispatch_io_close(channel, DISPATCH_IO_STOP);
                        dispatch_release(channel);
                    }
                });
            }
            else {
                printf("accept() failed;\n");
            }
        }
    });

    // Resume the source so we're ready to accept once we listen()
    dispatch_resume(acceptSource);

    // Listen() on the socket
    if (listen(servSock, SOMAXCONN) < 0) {
        shutdown(servSock, SHUT_RDWR);
        close(servSock);
        DieWithError("listen() failed");
    }

    // Make cleanup block for the server queue
    dispatch_block_t cleanupBlock = ^{
        dispatch_async(q, ^{
            shutdown(servSock, SHUT_RDWR);
            close(servSock);
            dispatch_release(acceptSource);
            dispatch_release(q);
        });
    };

    return Block_copy(cleanupBlock);
}

Anyway... back to the topic at hand:

If you're in situation #2, you should ask yourself, "Am I really gaining anything through this approach?" Let's say you have the most studly MacPro out there -- 12 cores, 24 hyperthreaded/virtual cores. With 64 threads, you've got an approx. 3:1 thread to virtual core ratio. Context switches and cache misses aren't free. Remember, we presumed that you weren't I/O bound for this scenario, so all you're doing by having more tasks than cores is wasting CPU time with context switches and cache thrash.

In reality, if your application is hanging because you've hit the queue width limit, then the most likely scenario is that you've starved your queue. You've likely created a dependency that reduces to a deadlock. The case I've seen most often is when multiple, interlocked threads are trying to dispatch_sync on the same queue, when there are no threads left. This is always fail.

Here's why: Queue width is an implementation detail. The 64 thread width limit of GCD is undocumented because a well-designed concurrency architecture shouldn't depend on the queue width. You should always design your concurrency architecture such that a 2 thread wide queue would eventually finish the job to the same result (if slower) as a 1000 thread wide queue. If you don't, there will always be a chance that your queue will get starved. Dividing your workload into parallelizable units should be opening yourself to the possibility of optimization, not a requirement for basic functioning. One way to enforce this discipline during development is to try working with a serial queue in places where you use concurrent queues, but expect non-interlocked behavior. Performing checks like this will help you catch some (but not all) of these bugs earlier.

Also, to the precise point of your original question: IIUC, the 64 thread limit is 64 threads per top-level concurrent queue, so if you really feel the need, you can use all three top level concurrent queues (Default, High and Low priority) to achieve more than 64 threads total. Please don't do this though. Fix your design such that it doesn't starve itself instead. You'll be happier. And anyway, as I hinted above, if you're starving out a 64 thread wide queue, you'll probably eventually just fill all three top level queues and/or run into the per-process thread limit and starve yourself that way too.

Limiting number of threads

GCD has no option to limit the amount of concurrent blocks running.

This will potentially create one thread that just waits for each operation you enqueue. GCD dynamically adjusts the number of threads it uses. If you enqueue another block and GCD has no more threads available it will spin up another thread if it notices there are free CPU cores available. Since the worker thread is sleeping inside your block the CPU is considered free. This will cause many threads using up a lot of memory - each thread gets 512 KB of Stack.

Your best option would be to use NSOperationQueue for this as you can control directly how many operations will be run in parallel using the maxConcurrentOperationCount property. This will be easier (less code for you to write, test and debug) and much more efficient.

GCD and Threads

It is not correct to state that five threads have been created in the general case.

There is no one-to-one mapping between threads and blocks. GCD is an implementation of thread pooling.

A certain number of threads are created according to the optimal setup for that device — the cost of creating and maintaing threads under that release of the OS, the number of processor cores available, the number of threads it already has but which are presently blocked and any other factors Apple cares to factor in may all be relevant.

GCD will then spread your blocks over those threads. Or it may create new threads. But it won't necessarily.

Beyond that queues are just ways of establishing the sequencing between blocks. A serial dispatch queue does not necessarily own its own thread. All concurrent dispatch queues do not necessarily own their own threads. But there's no reason to believe that any set of queues shares any threads.

The exact means of picking threads for blocks has changed between versions of the OS. E.g. iOS 4 was highly profligate in thread creation, in a way that iOS 5+ definitely haven't been.

GCD will just try to do whatever is best in the circumstances. Don't waste your time trying to second guess it.

Implement a thread pool using GCD

GCD already does thread pooling (dispatch queues are drawing upon a pool of “worker threads”), so it’s redundant/inefficient to add another layer of pooling on top of that.

You say:

The thing is - no matter how much threads I'm creating, it doesn't affect the performance at all.

That could be any of a number of things. One common problem includes that the unit of work is too small. As Performing Loops Concurrently says:

You should make sure that your task code does a reasonable amount of work through each iteration. As with any block or function you dispatch to a queue, there is overhead to scheduling that code for execution. If each iteration of your loop performs only a small amount of work, the overhead of scheduling the code may outweigh the performance benefits you might achieve from dispatching it to a queue.

But there are a variety of other problems ranging from inefficient synchronization code, cache sloshing, etc. It is impossible to say without a reproducible example of the problem. While QoS also has an impact, it is often negligible in comparison to these algorithmic issues.

You say:

Since queues are concurrent I want to limit tasks count which can be executed concurrently so thread would not be overwhelmed.

While you can achieve this with either non-zero dispatch semaphores or NSOperationQueue with some maxConcurrentOperationCount, the dispatch_apply (known as concurrentPerform for Swift users) is a “go to” solution for computationally-intensive, parallelized routines that balance workloads across CPU cores. It automatically looks at how many cores you’ve got, and distributes the loop across them, not risking an explosion in threads. And, as outlined in Improving on Loop Code, you can experiment with strides that do a good job balancing the amount of work done on each thread with the inherent overhead of the thread coordination. (Striding can also minimize cache contention.)

I might suggest researching dispatch_apply and giving it a try. If you’re still unclear at that point, just post a new question that shows both the non-parallel routine and the parallelized rendition, and we can help further.

As I’ve said above, I don’t think you want this routine at all. For computationally intensive routines, I would favor dispatch_apply. For simple queues for which I would want to control the degree of concurrency (especially if some of those tasks are, themselves asynchronous), I’d use NSOperationQueue with a maxConcurrentOperationCount. But I thought I’d share a few observations on your code snippet:

What you’ve implemented is a pool of queues, not a pool of threads;
What you’re calling threadsCount is not a count of threads, but rather a count of queues. So, if you create a pool with a count of 10 and tasksCount of 20, that means that you’re potentially using 200 threads.
Likewise what you’re calling _currentThreadId is not the current thread. It’s the current queue.
The interaction with _currentThreadId is not thread-safe.

Bottom line, GCD has its own pool of threads, so you shouldn’t reproduce that logic. All you need to do is to implement the “not more than threadCount” logic (which can be achieved with the non-zero dispatch semaphore). Thus, I’d suggest simplifying this to something like:

@interface ThreadPool()
@property (nonatomic, strong) dispatch_queue_t pool;
@property (nonatomic, strong) dispatch_queue_t scheduler;
@property (nonatomic, strong) dispatch_semaphore_t semaphore;
@end

@implementation ThreadPool

- (instancetype)initWithThreadCount:(int)threadCount {
    self = [super init];
    if (self) {
        NSString *identifier = [[NSUUID UUID] UUIDString];
        NSString *bundleIdentifier = [[NSBundle mainBundle] bundleIdentifier];

        NSString *schedulingLabel = [NSString stringWithFormat:@"%@.scheduler.%@", bundleIdentifier, identifier];
        _scheduler = dispatch_queue_create(schedulingLabel.UTF8String, DISPATCH_QUEUE_SERIAL);

        NSString *poolLabel = [NSString stringWithFormat:@"%@.pool.%@", bundleIdentifier, identifier];

        dispatch_queue_attr_t attr = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_CONCURRENT, QOS_CLASS_BACKGROUND, 0);
        _pool = dispatch_queue_create(poolLabel.UTF8String, attr);

        _semaphore = dispatch_semaphore_create(threadCount);
    }

    return self;
}

- (void)async:(ThreadPoolBlock)block {
    dispatch_async(self.scheduler, ^{
        dispatch_semaphore_wait(self.semaphore, DISPATCH_TIME_FOREVER);
        dispatch_async(self.pool, ^{
            block();
            dispatch_semaphore_signal(self.semaphore);
        });
    });
}

@end

Needless to say, this implementation, like yours, assumes that the block passed to async method is, itself, synchronous (e.g. it’s not starting yet another asynchronous process like a network request or whatever). I suspect you know that, but I only mention it for the sake of completeness.

Number of Threads Created by Gcd