Is Shared Readonly Data Copied to Different Processes for Multiprocessing

Is shared readonly data copied to different processes for multiprocessing?

You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

#-- edited 2015-05-01: the assert check below checks the wrong thing
# with recent versions of Numpy/multiprocessing. That no copy is made
# is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

# Parallel processing
def my_func(i, def_param=shared_array):
shared_array[i,:] = i

if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
pool.map(my_func, range(10))

print shared_array

which prints

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
[ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.]
[ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.]
[ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.]
[ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]

However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.

What is the right way to share a read only configuration to multiple processes?

The correct way to share static information within multiprocessing.Pool consist in using the initializer function to set it via its initargs.

The two above variables are in fact passed to the Pool workers as Process constructor parameters thus following the recommendations of the multiprocessing programming guidelines.

Explicitly pass resources to child processes

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

variable = None

def initializer(*initargs):
"""The initializer function is executed on each worker process
once they start.

"""
global variable

variable = initargs

def function(*args):
"""The function is executed on each parameter of `map`."""
print(variable)

with multiprocessing.Pool(initializer=initializer, initargs=[1, 2, 3]) as pool:
pool.map(function, (1, 2, 3))

Providing shared read-only ressources to parallel processes

You can define and compute output_1 as a global variable before creating your process pool; that way each process will have access to the data; this won't result in any memory duplication because you're not changing that data (copy-on-write).

_output_1 = serial_computation()

def parallel_computation(input_2):
# here you can access _output_1
# you must not modify it as this will result in creating new copy in the child process
...

def main():
input_2 = ...
with Pool() as pool:
output_2 = pool.map(parallel_computation, input_2)

multiprocessing global variable memory copying

In linux, forked processes have a copy-on-write view of the parent address space. forking is light-weight and the same program runs in both the parent and the child, except that the child takes a different execution path. As a small exmample,

import os
var = "unchanged"
pid = os.fork()
if pid:
print('parent:', os.getpid(), var)
os.waitpid(pid, 0)
else:
print('child:', os.getpid(), var)
var = "changed"

# show parent and child views
print(os.getpid(), var)

Results in

parent: 22642 unchanged
child: 22643 unchanged
22643 changed
22642 unchanged

Applying this to multiprocessing, in this example I load data into a global variable. Since python pickles the data sent to the process pool, I make sure it pickles something small like an index and have the worker get the global data itself.

import multiprocessing as mp
import os

my_big_data = "well, bigger than this"

def worker(index):
"""get char in big data"""
return my_big_data[index]

if __name__ == "__main__":
pool = mp.Pool(os.cpu_count())
for c in pool.imap_unordered(worker, range(len(my_big_data)), chunksize=1):
print(c)

Windows does not have a fork-and-exec model for running programs. It has to start a new instance of the python interpreter and clone all relevant data to the child. This is a heavy lift!

Shared-memory objects in multiprocessing

If you use an operating system that uses copy-on-write fork() semantics (like any common unix), then as long as you never alter your data structure it will be available to all child processes without taking up additional memory. You will not have to do anything special (except make absolutely sure you don't alter the object).

The most efficient thing you can do for your problem would be to pack your array into an efficient array structure (using numpy or array), place that in shared memory, wrap it with multiprocessing.Array, and pass that to your functions. This answer shows how to do that.

If you want a writeable shared object, then you will need to wrap it with some kind of synchronization or locking. multiprocessing provides two methods of doing this: one using shared memory (suitable for simple values, arrays, or ctypes) or a Manager proxy, where one process holds the memory and a manager arbitrates access to it from other processes (even over a network).

The Manager approach can be used with arbitrary Python objects, but will be slower than the equivalent using shared memory because the objects need to be serialized/deserialized and sent between processes.

There are a wealth of parallel processing libraries and approaches available in Python. multiprocessing is an excellent and well rounded library, but if you have special needs perhaps one of the other approaches may be better.

I am trying to understand how to share read-only objects with multiprocessing

When you pass bigset as an argument, it is pickled by the parent process and unplickled by the children processes.[1][2]

Pickling and unpickling a large set requires a lot of time. This explains why you are seeing few processes doing their job: the parent process has to pickle a lot of big objects, and the children have to wait for it. The parent process is a bottleneck.

Pickling parameters implies that parameters have to be sent to processes. Sending data from a process to another requires system calls, which is why you are not seeing 100% CPU usage by the user space code. Part of the CPU time is spent on kernel space.[3]

Pickling objects and sending them to subprocesses also implies that: 1. you need memory for the pickle buffer; 2. each subprocess gets a copy of bigset. This is why you are seeing an increase in memory usage.

Instead, when bigset is a global variable, it is not sent anywhere (unless you are using a start method other than fork). It is just inherited as-is by subprocesses, using the usual copy-on-write rules of fork().


Footnotes:

  1. In case you don't know what "pickling" means: pickle is one of the standard Python protocols to transform arbitrary Python objects to and from a byte sequence.

  2. imap() & co. use queues behind the scenes, and queues work by pickling/unpickling objects.

  3. I tried running your code (with all(pool.imap(_worker, xrange(100))) in order to make the process faster) and I got: 2 minutes user time, 13 seconds system time. That's almost 10% system time.

Multiprocessing: why is a numpy array shared with the child processes, while a list is copied?

They're all copy-on-write. What you're missing is that when you do, e.g.,

for x in data:
pass

the reference count on every object contained in data is temporarily incremented by 1, one at a time, as x is bound to each object in turn. For int objects, the refcount in CPython is part of the basic object layout, so the object gets copied (you did mutate it, because the refcount changes).

To make something more analogous to the numpy.ones case, try, e.g.,

data = [1] * 10**8

Then there's only a single unique object referenced many (10**8) times by the list, so there's very little to copy (the same object's refcount gets incremented and decremented many times).



Related Topics



Leave a reply



Submit