Can't pickle <type 'instancemethod'> when using multiprocessing Pool.map()
The problem is that multiprocessing must pickle things to sling them among processes, and bound methods are not picklable. The workaround (whether you consider it "easy" or not;-) is to add the infrastructure to your program to allow such methods to be pickled, registering it with the copy_reg standard library method.
For example, Steven Bethard's contribution to this thread (towards the end of the thread) shows one perfectly workable approach to allow method pickling/unpickling via copy_reg
.
Python multiprocessing PicklingError: Can't pickle <type 'function'>
Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.
This piece of code:
import multiprocessing as mp
class Foo():
@staticmethod
def work(self):
pass
if __name__ == '__main__':
pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()
yields an error almost identical to the one you posted:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
The problem is that the pool
methods all use a mp.SimpleQueue
to pass tasks to the worker processes. Everything that goes through the mp.SimpleQueue
must be pickable, and foo.work
is not picklable since it is not defined at the top level of the module.
It can be fixed by defining a function at the top level, which calls foo.work()
:
def work(foo):
foo.work()
pool.apply_async(work,args=(foo,))
Notice that foo
is pickable, since Foo
is defined at the top level and foo.__dict__
is picklable.
Can't pickle <type 'instancemethod'> using python's multiprocessing Pool.apply_async()
This works, using copy_reg
, as suggested by Alex Martelli in the first link you provided:
import copy_reg
import types
import multiprocessing
def _pickle_method(m):
if m.im_self is None:
return getattr, (m.im_class, m.im_func.func_name)
else:
return getattr, (m.im_self, m.im_func.func_name)
copy_reg.pickle(types.MethodType, _pickle_method)
class Controler(object):
def __init__(self):
nProcess = 10
pages = 10
self.__result = []
self.manageWork(nProcess, pages)
def BarcodeSearcher(self, x):
return x*x
def resultCollector(self, result):
self.__result.append(result)
def manageWork(self, nProcess, pages):
pool = multiprocessing.Pool(processes=nProcess)
for pag in range(pages):
pool.apply_async(self.BarcodeSearcher, args=(pag,),
callback=self.resultCollector)
pool.close()
pool.join()
print(self.__result)
if __name__ == '__main__':
Controler()
python multiprocessing Can't pickle <type 'function'>
Python's multiprocessing
module can not deal with functions/methods which cannot be pickled, which means you cannot use class or instance methods without a lot of hassle. I would recommend to use multiprocess
, which uses dill
for serialization instead of pickle
, and can deal with class or instance methods.
As far as I know, the interface is exactly the same as the one used in multiprocessing
, so you can use it as a drop-in replacement.
See also https://stackoverflow.com/a/21345423/1170207
Pickling error while using pool.map in multiprocessing
Changed my function to accept a list of dates :
def func1(datelist):
date1 = datelist[0]
date2 = datelist[1]
if __name__=='__main__':
pool = Pool(processes=4)
dates = [[dt.datetime(2016,6,17),dt.datetime(2016,6,23)],[dt.datetime(2016,6,24),dt.datetime(2016,6,30)],[dt.datetime(2016,7,1),dt.datetime(2016,7,7)],[dt.datetime(2016,7,8),dt.datetime(2016,7,14)]]
result=pool.map(func1,dates)
Multiprocessing: How to use Pool.map on a function defined in a class?
I also was annoyed by restrictions on what sort of functions pool.map could accept. I wrote the following to circumvent this. It appears to work, even for recursive use of parmap.
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(pipe, x):
pipe.send(f(x))
pipe.close()
return fun
def parmap(f, X):
pipe = [Pipe() for x in X]
proc = [Process(target=spawn(f), args=(c, x)) for x, (p, c) in izip(X, pipe)]
[p.start() for p in proc]
[p.join() for p in proc]
return [p.recv() for (p, c) in pipe]
if __name__ == '__main__':
print parmap(lambda x: x**x, range(1, 5))
Related Topics
How to Pass a Variable Between Flask Pages
How to Merge Lists into a List of Tuples
How to Get Line Count of a Large File Cheaply in Python
How to Do a Line Break (Line Continuation) in Python
How to Send a "Multipart/Form-Data" With Requests in Python
In Python, How to Determine If an Object Is Iterable
How Can the Euclidean Distance Be Calculated With Numpy
Is Explicitly Closing Files Important
How to Split a String into a List of Characters
Catch Multiple Exceptions in One Line (Except Block)
How to Merge Dictionaries of Dictionaries
Lambda in For Loop Only Takes Last Value
Dynamically Set Local Variable
How to Query as Group by in Django