The right way to limit maximum number of threads running at once?
It sounds like you want to implement the producer/consumer pattern with eight workers. Python has a Queue
class for this purpose, and it is thread-safe.
Each worker should call get()
on the queue to retrieve a task. This call will block if no tasks are available, causing the worker to go idle until one becomes available. Then the worker should execute the task and finally call task_done()
on the queue.
You would put tasks in the queue by calling put()
on the queue.
From the main thread, you can call join()
on the queue to wait until all pending tasks have been completed.
This approach has the benefit that you are not creating and destroying threads, which is expensive. The worker threads will run continuously, but will be asleep when no tasks are in the queue, using zero CPU time.
(The linked documentation page has an example of this very pattern.)
How to limit the number of Threads
The working solution is posted below.
The basic idea is that we declare only as many Threads instances as there are available CPUs. Then we proceed by adding the "tasks" (or "things" here) to the Queue.
As soon as the task is added to the queue it is being immediately picked up by one of the Thread instances we declared in the previous step.
Important: In order for this mechanism to work the MyThread.run()
method should be running inside of the while
loop. Otherwise MyThread instance will be terminated as soon as it completes the very first task. The while
loop will exit itself after no tasks in the Queue are left. That is the end of story.
import Queue
import threading, time
class MyThread(threading.Thread):
def __init__(self, theQueue=None):
threading.Thread.__init__(self)
self.theQueue=theQueue
def run(self):
while True:
thing=self.theQueue.get()
self.process(thing)
self.theQueue.task_done()
def process(self, thing):
time.sleep(1)
print 'processing %s'%thing
queue=Queue.Queue()
THINGS = ['Thing%02d'%i for i in range(101)]
AVAILABLE_CPUS=3
for OneOf in range(AVAILABLE_CPUS):
thread=MyThread(theQueue=queue)
thread.start() # thread started. But since there are no tasks in Queue yet it is just waiting.
for thing in THINGS:
queue.put(thing) # as soon as task in added here one of available Threads picks it up
Is there a maximum limit on running concurrent threads (Python)?
The operating system is always having some limits on the number of threads, and each thread uses some resources (notably some space, perhaps a megabyte, for the thread's call stack). So it is not reasonable to have lots of threads. Details are operating system and computer specific. On Linux, see getrlimit(2) for RLIMIT_STACK
(the default stack size) and RLIMIT_NPROC
(number of processes, actually tasks, including threads, you are permitted to have).. and also pthread_attr_setstacksize(3) & pthread_create(3).
Threads are often heavy on resources (so read about green threads). You don't want to have many (e.g. thousands, or even a hundred) of them on a laptop or desktop (some supercomputers or costly servers have hundreds of cores with NUMA, then you could try having more threads).
Read also about the C10K problem.
Common implementations of Python use a single Global Interpreter Lock so having lots of threads is not effective. I would recommend using a thread pool of a reasonable size (perhaps configurable, and probably a few dozens at most).
Consider using PycURL and probably its MULTI interface (see the documentation of the relevant C API in libcurl). Think in terms of an event loop (and perhaps continuation-passing style).
python how to set a thread limit?
Use Python's ThreadPoolExecutor with max_workers argument set to 10.
Something like this:`
pool = ThreadPoolExecutor(max_workers=10)
with open("data.txt") as f:
for line in f:
lines = line.rstrip("\n\r")
pool.submit(Checker,"company")
pool.shutdown(wait=True)
The pool
will automatically allocate threads as needed, limiting maximum number of allocation to 10. The first argument in pool.submit()
is the function name, the arguments are simply passed as comma-separated values.
pool.shutdown(wait=True)
waits for all threads to complete execution.
Python Limit number of threads allowed
concurrent.futures
has a ThreadPoolExecutor
class, which allows submitting many tasks and specify the maximum number of working threads:
with ThreadPoolExecutor(max_workers=20) as executor:
for letter in array_of_letters):
executor.submit(do_something, letter)
Check more examples in the package docs.
Python - Limiting the Number of Threads while passing arguments
This would probably be a simpler task with concurrent.futures
but I like getting my hands dirty, so here we go. A few suggestions:
- I find classes as thread targets often complicate things, so if there's no compelling reason, keep it simple
- It's easier to use a
with
block to acquire and release a semaphore, and a regular semaphore usually suffices in that case - 17 arguments can get messy; I would build a tuple of the arguments outside the call to
threading.Thread()
so it's easier to read, then unpack the tuple in the thread
This should work as a simple example; os.system()
just echoes something and sleeps, so you can see the thread count is limited by the semaphore.
import os
import threading
from random import randint
threadLimiter = threading.Semaphore(10)
def run_config(*args):
run, arg1, arg2 = args # unpack the 17 args by name
with threadLimiter:
seconds = randint(2, 7)
os.system(f"echo run {run}, args {arg1} {arg2} ; sleep {seconds}")
if __name__ == '__main__':
threads = []
run = "20" # I guess this is a string because of below?
for i in range (1, int(run)+1):
thr_args = (str(i), "arg1",
"arg2") # put the 17 args here
thr = threading.Thread(target=run_config, args=thr_args)
thr.start()
threads.append(thr)
for thr in threads:
thr.join()
Related Topics
How to Loop Over Multiple Dataframes and Produce Multiple List
How to Save\Load Models in Spark\Pyspark
Print() Prints Only Every Second Input
Find Value in Dictionary Using Regex in Python
How to Add Parenthesis Around a Substring in a String
Find the Item With Maximum Occurrences in a List
Passing Multiple Arguments from Django Template Href Link to View
Using Look Up Tables in Python
Pandas Counting and Summing Specific Conditions
Python - Having Trouble Opening a File With Spaces
How to Find 3 Immediate Words After Keyword Match Using Python
How to Clear All Variables in the Middle of a Python Script
Decode Utf-8 Encoding in Json String
Python: How to Check If Cell in CSV File Is Empty