Can Python Pickle Lambda Functions

Cannot pickle lambda function in python 3

Seems like in python 2, dill replaces pickle when you import. In python 3, you have to use dill directly instead.

This works in python 3.5:

>>> import dill 
>>> dill.dumps(lambda x: x**2)
b'\x80\x03cdill.dill\n_create_function\nq\x00(cdill.dill\n_load_type\nq\x01X\x08\x00\x00\x00CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x00K\x01K\x02KCC\x08|\x00\x00d\x01\x00\x13Sq\x05NK\x02\x86q\x06)X\x01\x00\x00\x00xq\x07\x85q\x08X\x07\x00\x00\x00<stdin>q\tX\x08\x00\x00\x00<lambda>q\nK\x01C\x00q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0etq\x0fRq\x10.'

Alternatively you can also import dill as pickle

>>> import dill as pickle 
>>> pickle.dumps(lambda x: x**2)

Python, cPickle, pickling lambda functions

The built-in pickle module is unable to serialize several kinds of python objects (including lambda functions, nested functions, and functions defined at the command line).

The picloud package includes a more robust pickler, that can pickle lambda functions.

from pickle import dumps
f = lambda x: x * 5
dumps(f) # error
from cloud.serialization.cloudpickle import dumps
dumps(f) # works

PiCloud-serialized objects can be de-serialized using the normal pickle/cPickle load and loads functions.

Dill also provides similar functionality

>>> import dill           
>>> f = lambda x: x * 5
>>> dill.dumps(f)
'\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_unmarshal\nq\x01Uec\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x08\x00\x00\x00|\x00\x00d\x01\x00\x14S(\x02\x00\x00\x00Ni\x05\x00\x00\x00(\x00\x00\x00\x00(\x01\x00\x00\x00t\x01\x00\x00\x00x(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x02\x85q\x03Rq\x04c__builtin__\n__main__\nU\x08<lambda>q\x05NN}q\x06tq\x07Rq\x08.'

How does one pickle arbitrary pytorch models that use lambda functions?

this is not a good idea. If you do this then if your code changes to a different github repo then it will be hard restore your models that took a lot of time to train. The cycles spent recovering those or retraining is not worth it. I recommend to instead do it the pytorch way and only save the weights as they recommend in pytorch.

Python: _pickle.PicklingError: Can't pickle function lambda

I'm not exactly sure why (though a thorough read through the multiprocessing docs would probably have an answer), but there's a pickling process involved in python's multiprocessing where child processes are passed certain things. While I would have expected the lambdas to be inherited and not passed via pickle-ing, I guess that's not what's happening.

Following the discussion in the comments, consider something like this approach:

import time
from multiprocessing import Pool
import itertools
import numpy as np
from multiprocessing import shared_memory

def add_mats(a, b):
    #time.sleep(0.00001)
    return (a + b)

# Helper for mp version
def add_mats_shared(shm_name, array_shape, array_dtype, i1, i2):
    shm = shared_memory.SharedMemory(name=shm_name)
    stacked = np.ndarray(array_shape, dtype=array_dtype, buffer=shm.buf)
    a = stacked[i1]
    b = stacked[i2]
    result = add_mats(a, b)
    shm.close()
    return result

if __name__ == "__main__":
    class Timer:
        def __init__(self):
            self.start = None
            self.stop  = None
            self.delta = None

        def __enter__(self):
            self.start = time.time()
            return self

        def __exit__(self, *exc_args):
            self.stop = time.time()
            self.delta = self.stop - self.start

    arrays = [np.random.rand(5,5) for _ in range(50)]
    index_combns = list(itertools.combinations(range(len(arrays)),2))

    # Helper for non-mp version
    def add_mats_pair(ij_pair):
        i, j = ij_pair
        a = arrays[i]
        b = arrays[j]
        return add_mats(a, b)

    with Timer() as t:
        # Do the pairwise operation without multiprocessing
        sums_no_mp = list(map(add_mats_pair, index_combns))

    print(f"Process took {t.delta} seconds with no MP")

    with Timer() as t:
        # Stack arrays and copy result into shared memory
        stacked = np.stack(arrays)
        shm = shared_memory.SharedMemory(create=True, size=stacked.nbytes)
        shm_arr = np.ndarray(stacked.shape, dtype=stacked.dtype, buffer=shm.buf)
        shm_arr[:] = stacked[:]

        with Pool(processes=32) as pool:
            processes = [pool.apply_async(add_mats_shared, (
                shm.name,
                stacked.shape,
                stacked.dtype,
                i,
                j,
            )) for (i,j) in index_combns]
            sums_mp = [p.get() for p in processes]

        shm.close()
        shm.unlink()

    print(f"Process took {t.delta} seconds with MP")

    for i in range(len(sums_no_mp)):
        assert (sums_no_mp[i] == sums_mp[i]).all()

    print("Results match.")

It uses multiprocessing.shared_memory to share a single numpy (N+1)-dimensional array (instead of a list of N-dimensional arrays) between the host process and child processes.

Other things that are different but don't matter:

Pool is used as a context manager to prevent having to explicitly close and join it.
Timer is a simply context manager to time blocks of code.
Some of the numbers have been adjusted randomly
pool.map replaced with calls to pool.apply_async

pool.map would be fine too, but you'd want to build the argument list before the .map call and unpack it in the worker function, e.g.:

with Pool(processes=32) as pool:
    args = [(
        shm.name,
        stacked.shape,
        stacked.dtype,
        i,
        j,
    ) for (i,j) in index_combns]
    sums_mp = pool.map(add_mats_shared, args)

# and 

# Helper for mp version
def add_mats_shared(args):
    shm_name, array_shape, array_dtype, i1, i2 = args
    shm = shared_memory.SharedMemory(name=shm_name)
    ....

PyTorch can't pickle lambda

So the problem isn't the lambda function per se, it's that pickle doesn't work with functions that aren't just module-level functions (the way pickle treats functions is just as references to some module-level name). So, unfortunately, if you need to capture the start and end arguments, you won't be able to use a closure, you'd normally just want something like:

def function_maker(start, end):
    def function(x):
        return x[:, start:end]
    return function

But this will get you right back to where you started, as far as the pickling problem is concerned.

So, try something like:

class Slicer:
    def __init__(self, start, end):
        self.start = start
        self.end = end
    def __call__(self, x):
        return x[:, self.start:self.end])

Then you can use:

LambdaLayer(Slicer(start, end))

I'm not familiar with PyTorch, I'm surprised though that it doesn't offer the ability to use a different serialization backend. The pathos/dill project can pickle arbitrary functions, for example, and is often easier to just use that. But I believe the above should solve the problem.

python pickle object with lambdas

The standard pickle module cannot serialize lambdas, but there is a third party package called dill which supports them.