Can't Pickle Defaultdict

Can't pickle defaultdict

In addition to Martijn's explanation:

A module-level function is a function which is defined at module level, that means it is not an instance method of a class, it's not nested within another function, and it is a "real" function with a name, not a lambda function.

So, to pickle your defaultdict, create it with module-level function instead of a lambda function:

def dd():
    return defaultdict(int)

dict1 = defaultdict(dd) # dd is a module-level function

than you can pickle it

tmp = pickle.dumps(dict1) # no exception
new = pickle.loads(tmp)

How to pickle a defaultdict which uses a lambda function?

A simple workaround would be to implement your tree data-structure differently, without defaultdict:

class DTree(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

try:    import cPickle as pickle
except: import pickle

#Create dtree object:
hapPkl = DTree()

#Create Pickle file
with open("hapP.pkl", "wb") as f:
    pickle.dump(hapPkl, f)

pickling defaultdict with lambda

You can do it with dill. You've made a typo… you should have used dill.dump instead of dill.dumps if you want to dump to a file. If you want to dump to a string, use dumps.

>>> import dill
>>> from collections import defaultdict
>>> pos = defaultdict(lambda: 0)
>>> neg = defaultdict(lambda: 0)
>>> countdata = (pos,neg)
>>> _countdata = dill.loads(dill.dumps(countdata))
>>> _countdata
(defaultdict(<function <lambda> at 0x10917f7d0>, {}), defaultdict(<function <lambda> at 0x10917f8c0>, {}))
>>>
>>> # now dump countdata to a file 
>>> with open('data.pkl', 'wb') as f:
...     dill.dump(countdata, f)
...
>>>

Python: how does Pickle work with defaultdict

First of all, if you look at the pickle docs, specifically:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored

So what this is telling us is that pickle will import the module that defines the object you are unpickling.

We can show this with a small example, consider the following folder structure:

parent/
|-- a.py
|-- sub/

sub is an empty sub-folder

a.py holds an example class

# a.py
class ExampleClass:
    def __init__(self):
        self.var = 'This is a string'

Now starting the python console in the parent directory:

alex@toaster:parent$ python3
>>> import pickle
>>> from a import ExampleClass
>>> x = ExampleClass()
>>> x.var
'This is a string'
>>> with open('eg.p', 'wb') as f:
...     pickle.dump(x, f)

Exit the shell. Move to the sub directory and try to load the pickled ExampleClass object.

alex@toaster:sub$ python3
>>> import pickle
>>> with open('../eg.p', 'rb') as f:
...     x = pickle.load(f)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'a'

We get a ModuleNotFoundError as pickle cannot load the class definition from the module a (it's in a different directory). In your case, python can load the collections.defaultdict class as this module is on the PYTHONPATH. However, to continue to use the module(s) imported by pickle you will still need to import them yourself; eg you want to create another defaultdict in script2.py.

To find out more about modules look here, specifically 6.1.2 The Module Search Path.

Can't pickle recursive nested defaultdict

The defaultdict class implements a object.__reduce__() method where the second element of the returned tuple (the arguments for the constructor) is always going to be the factory object:

>>> d = NestedDict()
>>> d.__reduce__()
(<class '__main__.NestedDict'>, (<class '__main__.NestedDict'>,), None, None, <dict_itemiterator object at 0x110df59a8>)

That argument is then passed to the NestedDict() call to re-build the object. The exception is thrown because the NestedDict class doesn’t accept an argument.

You can override the __reduce__ method in your subclass:

class NestedDict(defaultdict):
    def __init__(self):
        super().__init__(self.__class__)
    def __reduce__(self):
        return (type(self), (), None, None, iter(self.items()))

The above produces the exact same elements defaultdict.__reduce__() returns, except that the second element is now an empty tuple.

You could also just accept and ignore a single argument:

class NestedDict(defaultdict):
    def __init__(self, _=None):  # accept a factory and ignore it
        super().__init__(self.__class__)

The _ name is commonly used to mean I am ignoring this value.

An alternative implementation could just subclass dict and provide a custom __missing__ method; this method is called for keys not in the dictionary:

class NestedDict(dict):
    def __missing__(self, key):
        nested = self[key] = type(self)()
        return nested
    def __repr__(self):
        return f'{type(self).__name__}({super().__repr__()})'

This works exactly like your version, but doesn't need additional pickle support methods:

>>> d = NestedDict()
>>> d['foo']
NestedDict({})
>>> d['foo']['bar']
NestedDict({})
>>> d
NestedDict({'foo': NestedDict({'bar': NestedDict({})})})
>>> pickle.loads(pickle.dumps(d))
NestedDict({'foo': NestedDict({'bar': NestedDict({})})})

Pickling a dictionary that uses defaultdict

Unfortunately, that answer there is correct for that question, but subtly wrong for yours. Although a top-level function instead of a lambda is great and indeed would make pickle a lot happier, the function should return the default value to be used, which for your case is not another defaultdict object.

Simply return the same value your lambda returns:

def dd():
    return 1

Every time you try to access a key in the defaultdict instance that doesn't yet exist, dd is called. The other post then returns another defaultdict instance, that one set to use int as a default, which matches the lambda shown in the other question.

AttributeError: Can't pickle local object 'locals.lambda'

pickle records references to functions (module and function name), not the functions themselves. When unpickling, it will load the module and get the function by name. lambda creates anonymous function objects that don't have names and can't be found by the loader. The solution is to switch to a named function.

def create_int_defaultdict():
    return collections.defaultdict(int)

class A:
  def funA(self):
    #create a dictionary and fill with values
    dictionary = collections.defaultdict(create_int_defaultdict)
    ...
    #then pickle to save it
    pickle.dump(dictionary, f)

Can't Pickle Defaultdict