Multiprocessing - Pipe VS Queue

Multiprocessing - Pipe vs Queue

A Pipe() can only have two endpoints.
A Queue() can have multiple producers and consumers.

When to use them

If you need more than two points to communicate, use a Queue().

If you need absolute performance, a Pipe() is much faster because Queue() is built on top of Pipe().

Performance Benchmarking

Let's assume you want to spawn two processes and send messages between them as quickly as possible. These are the timing results of a drag race between similar tests using Pipe() and Queue()...

FYI, I threw in results for SimpleQueue() and JoinableQueue() as a bonus.

JoinableQueue() accounts for tasks when queue.task_done() is called (it doesn't even know about the specific task, it just counts unfinished tasks in the queue), so that queue.join() knows the work is finished.

The code for each at bottom of this answer...

# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.9.2

$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.14316844940185547 seconds
Sending 100000 numbers to Pipe() took 1.3749017715454102 seconds
Sending 1000000 numbers to Pipe() took 14.252539157867432 seconds
$  python multi_queue.py
Sending 10000 numbers to Queue() took 0.17014789581298828 seconds
Sending 100000 numbers to Queue() took 1.7723784446716309 seconds
Sending 1000000 numbers to Queue() took 17.758610725402832 seconds
$ python multi_simplequeue.py
Sending 10000 numbers to SimpleQueue() took 0.14937686920166016 seconds
Sending 100000 numbers to SimpleQueue() took 1.5389132499694824 seconds
Sending 1000000 numbers to SimpleQueue() took 16.871352910995483 seconds
$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.15144729614257812 seconds
Sending 100000 numbers to JoinableQueue() took 1.567549228668213 seconds
Sending 1000000 numbers to JoinableQueue() took 16.237736225128174 seconds

# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.7.0

(py37_test) [mpenning@mudslide ~]$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.13469791412353516 seconds
Sending 100000 numbers to Pipe() took 1.5587594509124756 seconds
Sending 1000000 numbers to Pipe() took 14.467186689376831 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_queue.py
Sending 10000 numbers to Queue() took 0.1897726058959961 seconds
Sending 100000 numbers to Queue() took 1.7622203826904297 seconds
Sending 1000000 numbers to Queue() took 16.89015531539917 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.2238149642944336 seconds
Sending 100000 numbers to JoinableQueue() took 1.4744081497192383 seconds
Sending 1000000 numbers to JoinableQueue() took 15.264554023742676 seconds

# This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

In summary:

Under python 2.7, Pipe() is about 300% faster than a Queue(). Don't even think about the JoinableQueue() unless you really must have the benefits.
Under python 3.x, Pipe() still has a (roughly 20%) edge over the Queue()s, but the performance gaps between Pipe() and Queue() are not as dramatic as they were in python 2.7. The various Queue() implementations are within roughly 15% of each other. Also my tests use integer data. Some people commented that they found performance differences in the data-types used with multiprocessing.

Bottom line for python 3.x: YMMV... consider running your own tests with your own data-types (i.e. integer / string / objects) to form conclusions about your own platforms of interest and use-cases.

I should also mention that my python3.x performance tests are inconsistent and vary somewhat. I ran multiple tests over several minutes to get the best results for each case. I suspect these differences have something to do with running my python3 tests under VMWare / virtualization; however, the virtualization diagnosis is speculation.

*** RESPONSE TO A COMMENT ABOUT TEST TECHNIQUES ***

In the comments, @JJC said:

a more fair comparison would be running N workers, each communicating with main thread via point-to-point pipe compared to performance of running N workers all pulling from a single point-to-multipoint queue.

Originally, this answer only considered the performance of one worker and one producer; that's the baseline use-case for Pipe(). Your comment requires adding different tests for multiple worker processes. While this is a valid observation for common Queue() use-cases, it could easily explode the test matrix along a completely new axis (i.e. adding tests with various numbers of worker processes).

BONUS MATERIAL 2

Multiprocessing introduces subtle changes in information flow that make debugging hard unless you know some shortcuts. For instance, you might have a script that works fine when indexing through a dictionary in under many conditions, but infrequently fails with certain inputs.

Normally we get clues to the failure when the entire python process crashes; however, you don't get unsolicited crash tracebacks printed to the console if the multiprocessing function crashes. Tracking down unknown multiprocessing crashes is hard without a clue to what crashed the process.

The simplest way I have found to track down multiprocessing crash informaiton is to wrap the entire multiprocessing function in a try / except and use traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

Now, when you find a crash you see something like:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

Source Code:

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in range(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_simplequeue.py
"""

from multiprocessing import Process, SimpleQueue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = SimpleQueue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to SimpleQueue() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

multiprocessing.Pipe() vs .Queue()

They're very different things, with very different behavior.

A Queue instance has put, get, empty, full, and various other methods. It has an optional maximum size (number of items, really). Anyone can put or get into any queue. It is process-safe as it handles all the locking.

The Pipe function—note that this is a function, not a class instance—returns two objects of type Connection (these are the class instances). These two instances are connected to each other. The connection between them can be single-duplex, i.e., you can only send on one and only receive on the other, or it can be full-duplex, i.e., whatever you actually send on one is received on the other. The two objects have send, recv, send_bytes, recv_bytes, fileno, and close methods among others. The send and receive methods use the pickling code to translate between objects and bytes since the actual data transfer is via byte-stream. The connection objects are not locked and therefore not process-safe.

Data transfer between processes generally uses these Connection objects: this and shared-memory are the underlying mechanism for all process-to-process communication in the multiprocessing code. Queue instances are much higher level objects, which ultimately need to use a Connection to send or receive the byte-stream that represents the object being transferred across the queue. So in that sense, they do the same thing—but that's a bit like saying that a USB cable does the same thing as the things that connect them. Usually, you don't want to deal with individual voltages on a wire: it's much nicer to just send or receve a whole object. (This analogy is a little weak, because Connection instances have send and recv as well as send_bytes and recv_bytes, but it's probably still helpful.)

multiprocessing.Pipe is even slower than multiprocessing.Queue?

You can do an experiment and put the following into your Pipe code above..

def worker(conn):
    for task_nbr in range(NUM):
        data = np.random.rand(400, 400, 3)
    sys.exit(1)

def main():
    parent_conn, child_conn = Pipe(duplex=False)
    p = Process(target=worker, args=(child_conn,))
    p.start()
    p.join()

This gives you the time that it takes to create the data for your test. On my system this takes about 2.9 seconds.

Under the hood the queue object implements a buffer and a threaded send. The thread is still in the same process but by using it, the data creation doesn't have to wait for the system IO to complete. It effectively parallelizes the operations. Try your Pipe code modified with some simple threading implemented (disclaimer, code here is for test only and is not production ready)..

import sys
import time
import threading
from multiprocessing import Process, Pipe, Lock
import numpy as np
import copy

NUM = 1000

def worker(conn):
    _conn = conn
    _buf = []
    _wlock = Lock()
    _sentinel = object() # signal that we're done
    def thread_worker():
        while 1:
            if _buf:
                _wlock.acquire()
                obj = _buf.pop(0)
                if obj is _sentinel: return
                _conn.send(data)
                _wlock.release()
    t = threading.Thread(target=thread_worker)
    t.start()
    for task_nbr in range(NUM):
        data = np.random.rand(400, 400, 3)
        data[0][0][0] = task_nbr    # just for integrity check
        _wlock.acquire()
        _buf.append(data)
        _wlock.release()
    _wlock.acquire()
    _buf.append(_sentinel)
    _wlock.release()
    t.join()
    sys.exit(1)

def main():
    parent_conn, child_conn = Pipe(duplex=False)
    Process(target=worker, args=(child_conn,)).start()
    for num in range(NUM):
        message = parent_conn.recv()
        assert num == message[0][0][0], 'Data was corrupted'        

if __name__ == "__main__":
    start_time = time.time()
    main()
    end_time = time.time()
    duration = end_time - start_time
    msg_per_sec = NUM / duration

    print "Duration: %s" % duration
    print "Messages Per Second: %s" % msg_per_sec

On my machine this takes 3.4 seconds to run which is almost exactly the same as your Queue code above.

From https://docs.python.org/2/library/threading.html

In Cython, due to due to the Global Interpreter Lock, only one thread can execute Python code at once... however, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

The queue and pipe differences are definitely an odd implementation detail until you dig into it a bit.

Python Multiprocessing Controlled Bidirectional Queue

For those facing the same problem, I have found the solution, it is multiprocessing.Pipe() it is faster than queues but it only works if you have 2 processes.

Here is a simple example to help

import multiprocessing as mp
from time import time

def process1_function(conn, events):
    for event in events:
        # send jobs to the process_2
        conn.send((event, time()))
        print(f"Event Sent: {event}")
        # check if there are any messages in the pipe from process_2
        if conn.poll():
            # read the message from process_2
            print(conn.recv())
    # continue checking the messages in the pipe from process_2
    while conn.poll():
        print(conn.recv())

def process2_function(conn):
    while True:
        # check if there are any messages in the pipe from process_1
        if conn.poll():
            # read messages in the pipe from process_1
            event, sent = conn.recv()
            # send messages to process_1
            conn.send(f"{event} complete, {time() - sent}")
            if event == "eod":
                break
    conn.send("all events finished")

def run():
    events = ["get up", "brush your teeth", "shower", "work", "eod"]
    conn1, conn2 = mp.Pipe()
    process_1 = mp.Process(target=process1_function, args=(conn1, events))
    process_2 = mp.Process(target=process2_function, args=(conn2,))
    process_1.start()
    process_2.start()
    process_1.join()
    process_2.join()

if __name__ == "__main__":
    run()

Python 3.4 multiprocessing Queue faster than Pipe, unexpected

I can't say for sure, but I think the issue you're dealing with is synchronous versus asynchronous I/O. My guess is that the Pipe is somehow ending up synchronous and the Queue is ending up asynchronous. Why exactly one is defaulting one way and the other is the other might be better answered by this question and answer:

Synchronous/Asynchronous behaviour of python Pipes

Python multiprocessing.Queue vs multiprocessing.manager().Queue()

Though my understanding is limited about this subject, from what I did I can tell there is one main difference between multiprocessing.Queue() and multiprocessing.Manager().Queue():

multiprocessing.Queue() is an object whereas multiprocessing.Manager().Queue() is an address (proxy) pointing to shared queue managed by the multiprocessing.Manager() object.
therefore you can't pass normal multiprocessing.Queue() objects to Pool methods, because it can't be pickled.
Moreover the python doc tells us to pay particular attention when using multiprocessing.Queue() because it can have undesired effects

Note When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences which are a little surprising, but should not cause any practical difficulties – if they really bother you then you can instead use a queue created with a manager.
After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising Queue.Empty.
If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other.

Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue.

There is a workaround to use multiprocessing.Queue() with Pool by setting the queue as a global variable and setting it for all processes at initialization :

queue = multiprocessing.Queue()
def initialize_shared(q):
    global queue
    queue=q

pool= Pool(nb_process,initializer=initialize_shared, initargs(queue,))

will create pool processes with correctly shared queues but we can argue that the multiprocessing.Queue() objects were not created for this use.

On the other hand the manager.Queue() can be shared between pool subprocesses by passing it as normal argument of a function.

In my opinion, using multiprocessing.Manager().Queue() is fine in every case and less troublesome. There might be some drawbacks using a manager but I'm not aware of it.