Multiprocessing VS Multithreading VS Asyncio in Python 3

multiprocessing vs multithreading vs asyncio

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That's why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That's why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it's just a solution (a good one indeed!) for a particular task, not for parallel processing in general.

Difference between multiprocessing, asyncio, threading and concurrency.futures in python

There are several different libraries at play:

  • threading: interface to OS-level threads. Note that CPU-bound work is mostly serialized by the GIL, so don't expect threading to speed up calculations. Use it when you need to invoke blocking APIs in parallel, and when you require precise control over thread creation. Avoid creating too many threads (e.g. thousands), as they are not free. If possible, don't create threads yourself, use concurrent.futures instead.

  • multiprocessing: interface to spawning multiple python processes with an API intentionally similar to threading. Multiple processes work in parallel, so you can actually speed up calculations using this method. The disadvantage is that you can't share in-memory datastructures without using multi-processing specific tools.

  • concurrent.futures: A modern interface to threading and multiprocessing, which provides convenient thread/process pools it calls executors. The pool's main entry point is the submit method which returns a handle that you can test for completion or wait for its result. Getting the result gives you the return value of the submitted function and correctly propagates raised exceptions (if any), which would be tedious to do with threading. concurrent.futures should be the tool of choice when considering thread or process based parallelism.

  • asyncio: While the previous options are "async" in the sense that they provide non-blocking APIs (this is what methods like apply_async refer to), they are still relying on thread/process pools to do their magic, and cannot really do more things in parallel than they have workers in the pool. Asyncio is different: it uses a single thread of execution and async system calls across the board. It has no blocking calls at all, the only blocking part being the asyncio.run() entry point. Asyncio code is typically written using coroutines, which use await to suspend until something interesting happens. (Suspending is different than blocking in that it allows the event loop thread to continue to other things while you're waiting.) It has many advantages compared to thread-based solutions, such as being able to spawn thousands of cheap "tasks" without bogging down the system, and being able to cancel tasks or easily wait for multiple things at once. Asyncio should be the tool of choice for servers and for clients connecting to multiple servers.

When choosing between asyncio and multithreading/multiprocessing, consider the adage that "threading is for working in parallel, and async is for waiting in parallel".

Also note that asyncio can await functions executed in thread or process pools provided by concurrent.futures, so it can serve as glue between all those different models. This is part of the reason why asyncio is often used to build new library infrastructure.

Multiprocessing vs Threading Python

The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads.

Python threading.Semaphore vs asyncio.Semaphore

The whole goal of a semaphore is to provide exclusive access to something. Only one "piece of code" can access own the semaphore at any one time.

What I mean by "piece of code" in the previous statement depends on whether I'm using multi-threading, multi-processing, or asyncio. And the means by which you guarantee exclusive access depends on what I'm using.

Asyncio is the most restricted kind of multi-threading. Everything is running within a single Python thread. The Python interpreter is only executing one thing at a time. Each "piece of code" runs until it voluntarily waits for something to happen. Then another "piece of code" is allowed to run. Eventually the original piece of code runs again when the thing it was waiting on happens.

With multithreading, multiple pieces of code are running within the Python interpreter. Only one piece of code runs at any time, but they are not politely waiting for each other. Python switches from "piece of code" to "piece of code" as it wants.

With multiprocessing, multiple Pythons are running simultaneously.
There is no sharing between the pieces of code, other than what is provided by the operating system. To set up a semaphore usually requires some support from the operating system to create a shared variable that all threads/processes can access.

So. Asyncio primitives are designed so that they are all run within a single Python process with the processes cooperating. They are not designed to work if multiple pieces of code try to use it simultaneously.

I hope this helps.

Python Asyncio and Multithreading

Use loop.call_soon_threadsafe

In general, asyncio isn't thread safe

Almost all asyncio objects are not thread safe, which is typically not
a problem unless there is code that works with them from outside of a
Task or a callback. If there’s a need for such code to call a
low-level asyncio API, the loop.call_soon_threadsafe() method should
be used

https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading

SCHEDULE COMPUTATION

loop.call_soon_threadsafe(self.nodes[node_id].schedule_computation,x)

Node.computation runs on main thread

Not sure if you are aware, but even though you can use call_soon_threadsafe to initiate a coroutine from another thread. The coroutine always runs in the thread the loop was created in. If you want to run coroutines on another thread, then your background thread will need its own EventLoop also.



Related Topics



Leave a reply



Submit