Can Python Pickle Lambda Functions

Cannot pickle lambda function in python 3

Seems like in python 2, dill replaces pickle when you import. In python 3, you have to use dill directly instead.

This works in python 3.5:

>>> import dill 
>>> dill.dumps(lambda x: x**2)
b'\x80\x03cdill.dill\n_create_function\nq\x00(cdill.dill\n_load_type\nq\x01X\x08\x00\x00\x00CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x00K\x01K\x02KCC\x08|\x00\x00d\x01\x00\x13Sq\x05NK\x02\x86q\x06)X\x01\x00\x00\x00xq\x07\x85q\x08X\x07\x00\x00\x00<stdin>q\tX\x08\x00\x00\x00<lambda>q\nK\x01C\x00q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0etq\x0fRq\x10.'

Alternatively you can also import dill as pickle

>>> import dill as pickle 
>>> pickle.dumps(lambda x: x**2)

Python, cPickle, pickling lambda functions

The built-in pickle module is unable to serialize several kinds of python objects (including lambda functions, nested functions, and functions defined at the command line).

The picloud package includes a more robust pickler, that can pickle lambda functions.

from pickle import dumps
f = lambda x: x * 5
dumps(f) # error
from cloud.serialization.cloudpickle import dumps
dumps(f) # works

PiCloud-serialized objects can be de-serialized using the normal pickle/cPickle load and loads functions.

Dill also provides similar functionality

>>> import dill           
>>> f = lambda x: x * 5
>>> dill.dumps(f)
'\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_unmarshal\nq\x01Uec\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x08\x00\x00\x00|\x00\x00d\x01\x00\x14S(\x02\x00\x00\x00Ni\x05\x00\x00\x00(\x00\x00\x00\x00(\x01\x00\x00\x00t\x01\x00\x00\x00x(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x02\x85q\x03Rq\x04c__builtin__\n__main__\nU\x08<lambda>q\x05NN}q\x06tq\x07Rq\x08.'

How does one pickle arbitrary pytorch models that use lambda functions?

this is not a good idea. If you do this then if your code changes to a different github repo then it will be hard restore your models that took a lot of time to train. The cycles spent recovering those or retraining is not worth it. I recommend to instead do it the pytorch way and only save the weights as they recommend in pytorch.

Python: _pickle.PicklingError: Can't pickle function lambda

I'm not exactly sure why (though a thorough read through the multiprocessing docs would probably have an answer), but there's a pickling process involved in python's multiprocessing where child processes are passed certain things. While I would have expected the lambdas to be inherited and not passed via pickle-ing, I guess that's not what's happening.

Following the discussion in the comments, consider something like this approach:

import time
from multiprocessing import Pool
import itertools
import numpy as np
from multiprocessing import shared_memory

def add_mats(a, b):
#time.sleep(0.00001)
return (a + b)

# Helper for mp version
def add_mats_shared(shm_name, array_shape, array_dtype, i1, i2):
shm = shared_memory.SharedMemory(name=shm_name)
stacked = np.ndarray(array_shape, dtype=array_dtype, buffer=shm.buf)
a = stacked[i1]
b = stacked[i2]
result = add_mats(a, b)
shm.close()
return result

if __name__ == "__main__":
class Timer:
def __init__(self):
self.start = None
self.stop = None
self.delta = None

def __enter__(self):
self.start = time.time()
return self

def __exit__(self, *exc_args):
self.stop = time.time()
self.delta = self.stop - self.start

arrays = [np.random.rand(5,5) for _ in range(50)]
index_combns = list(itertools.combinations(range(len(arrays)),2))

# Helper for non-mp version
def add_mats_pair(ij_pair):
i, j = ij_pair
a = arrays[i]
b = arrays[j]
return add_mats(a, b)

with Timer() as t:
# Do the pairwise operation without multiprocessing
sums_no_mp = list(map(add_mats_pair, index_combns))

print(f"Process took {t.delta} seconds with no MP")

with Timer() as t:
# Stack arrays and copy result into shared memory
stacked = np.stack(arrays)
shm = shared_memory.SharedMemory(create=True, size=stacked.nbytes)
shm_arr = np.ndarray(stacked.shape, dtype=stacked.dtype, buffer=shm.buf)
shm_arr[:] = stacked[:]

with Pool(processes=32) as pool:
processes = [pool.apply_async(add_mats_shared, (
shm.name,
stacked.shape,
stacked.dtype,
i,
j,
)) for (i,j) in index_combns]
sums_mp = [p.get() for p in processes]

shm.close()
shm.unlink()

print(f"Process took {t.delta} seconds with MP")

for i in range(len(sums_no_mp)):
assert (sums_no_mp[i] == sums_mp[i]).all()

print("Results match.")

It uses multiprocessing.shared_memory to share a single numpy (N+1)-dimensional array (instead of a list of N-dimensional arrays) between the host process and child processes.

Other things that are different but don't matter:

  • Pool is used as a context manager to prevent having to explicitly close and join it.
  • Timer is a simply context manager to time blocks of code.
  • Some of the numbers have been adjusted randomly
  • pool.map replaced with calls to pool.apply_async

pool.map would be fine too, but you'd want to build the argument list before the .map call and unpack it in the worker function, e.g.:

with Pool(processes=32) as pool:
args = [(
shm.name,
stacked.shape,
stacked.dtype,
i,
j,
) for (i,j) in index_combns]
sums_mp = pool.map(add_mats_shared, args)

# and

# Helper for mp version
def add_mats_shared(args):
shm_name, array_shape, array_dtype, i1, i2 = args
shm = shared_memory.SharedMemory(name=shm_name)
....

PyTorch can't pickle lambda

So the problem isn't the lambda function per se, it's that pickle doesn't work with functions that aren't just module-level functions (the way pickle treats functions is just as references to some module-level name). So, unfortunately, if you need to capture the start and end arguments, you won't be able to use a closure, you'd normally just want something like:

def function_maker(start, end):
def function(x):
return x[:, start:end]
return function

But this will get you right back to where you started, as far as the pickling problem is concerned.

So, try something like:

class Slicer:
def __init__(self, start, end):
self.start = start
self.end = end
def __call__(self, x):
return x[:, self.start:self.end])

Then you can use:

LambdaLayer(Slicer(start, end))

I'm not familiar with PyTorch, I'm surprised though that it doesn't offer the ability to use a different serialization backend. The pathos/dill project can pickle arbitrary functions, for example, and is often easier to just use that. But I believe the above should solve the problem.

python pickle object with lambdas

The standard pickle module cannot serialize lambdas, but there is a third party package called dill which supports them.



Related Topics



Leave a reply



Submit