The Right Way to Limit Maximum Number of Threads Running At Once

The right way to limit maximum number of threads running at once?

It sounds like you want to implement the producer/consumer pattern with eight workers. Python has a Queue class for this purpose, and it is thread-safe.

Each worker should call get() on the queue to retrieve a task. This call will block if no tasks are available, causing the worker to go idle until one becomes available. Then the worker should execute the task and finally call task_done() on the queue.

You would put tasks in the queue by calling put() on the queue.

From the main thread, you can call join() on the queue to wait until all pending tasks have been completed.

This approach has the benefit that you are not creating and destroying threads, which is expensive. The worker threads will run continuously, but will be asleep when no tasks are in the queue, using zero CPU time.

(The linked documentation page has an example of this very pattern.)

How to limit the number of Threads

The working solution is posted below.
The basic idea is that we declare only as many Threads instances as there are available CPUs. Then we proceed by adding the "tasks" (or "things" here) to the Queue.
As soon as the task is added to the queue it is being immediately picked up by one of the Thread instances we declared in the previous step.

Important: In order for this mechanism to work the MyThread.run() method should be running inside of the while loop. Otherwise MyThread instance will be terminated as soon as it completes the very first task. The while loop will exit itself after no tasks in the Queue are left. That is the end of story.

import Queue
import threading, time

class MyThread(threading.Thread):
def __init__(self, theQueue=None):
threading.Thread.__init__(self)
self.theQueue=theQueue

def run(self):
while True:
thing=self.theQueue.get()
self.process(thing)
self.theQueue.task_done()

def process(self, thing):
time.sleep(1)
print 'processing %s'%thing

queue=Queue.Queue()
THINGS = ['Thing%02d'%i for i in range(101)]
AVAILABLE_CPUS=3

for OneOf in range(AVAILABLE_CPUS):
thread=MyThread(theQueue=queue)
thread.start() # thread started. But since there are no tasks in Queue yet it is just waiting.

for thing in THINGS:
queue.put(thing) # as soon as task in added here one of available Threads picks it up

Is there a maximum limit on running concurrent threads (Python)?

The operating system is always having some limits on the number of threads, and each thread uses some resources (notably some space, perhaps a megabyte, for the thread's call stack). So it is not reasonable to have lots of threads. Details are operating system and computer specific. On Linux, see getrlimit(2) for RLIMIT_STACK (the default stack size) and RLIMIT_NPROC (number of processes, actually tasks, including threads, you are permitted to have).. and also pthread_attr_setstacksize(3) & pthread_create(3).

Threads are often heavy on resources (so read about green threads). You don't want to have many (e.g. thousands, or even a hundred) of them on a laptop or desktop (some supercomputers or costly servers have hundreds of cores with NUMA, then you could try having more threads).

Read also about the C10K problem.

Common implementations of Python use a single Global Interpreter Lock so having lots of threads is not effective. I would recommend using a thread pool of a reasonable size (perhaps configurable, and probably a few dozens at most).

Consider using PycURL and probably its MULTI interface (see the documentation of the relevant C API in libcurl). Think in terms of an event loop (and perhaps continuation-passing style).

python how to set a thread limit?

Use Python's ThreadPoolExecutor with max_workers argument set to 10.

Something like this:`

pool = ThreadPoolExecutor(max_workers=10)
with open("data.txt") as f:
for line in f:
lines = line.rstrip("\n\r")
pool.submit(Checker,"company")

pool.shutdown(wait=True)

The pool will automatically allocate threads as needed, limiting maximum number of allocation to 10. The first argument in pool.submit() is the function name, the arguments are simply passed as comma-separated values.

pool.shutdown(wait=True) waits for all threads to complete execution.

Python Limit number of threads allowed

concurrent.futures has a ThreadPoolExecutor class, which allows submitting many tasks and specify the maximum number of working threads:

with ThreadPoolExecutor(max_workers=20) as executor:
for letter in array_of_letters):
executor.submit(do_something, letter)

Check more examples in the package docs.

Python - Limiting the Number of Threads while passing arguments

This would probably be a simpler task with concurrent.futures but I like getting my hands dirty, so here we go. A few suggestions:

  • I find classes as thread targets often complicate things, so if there's no compelling reason, keep it simple
  • It's easier to use a with block to acquire and release a semaphore, and a regular semaphore usually suffices in that case
  • 17 arguments can get messy; I would build a tuple of the arguments outside the call to threading.Thread() so it's easier to read, then unpack the tuple in the thread

This should work as a simple example; os.system() just echoes something and sleeps, so you can see the thread count is limited by the semaphore.

import os
import threading
from random import randint

threadLimiter = threading.Semaphore(10)

def run_config(*args):
run, arg1, arg2 = args # unpack the 17 args by name

with threadLimiter:
seconds = randint(2, 7)
os.system(f"echo run {run}, args {arg1} {arg2} ; sleep {seconds}")

if __name__ == '__main__':
threads = []
run = "20" # I guess this is a string because of below?

for i in range (1, int(run)+1):
thr_args = (str(i), "arg1",
"arg2") # put the 17 args here
thr = threading.Thread(target=run_config, args=thr_args)
thr.start()
threads.append(thr)

for thr in threads:
thr.join()


Related Topics



Leave a reply



Submit