Can't pickle defaultdict
In addition to Martijn's explanation:
A module-level function is a function which is defined at module level, that means it is not an instance method of a class, it's not nested within another function, and it is a "real" function with a name, not a lambda function.
So, to pickle your defaultdict
, create it with module-level function instead of a lambda function:
def dd():
return defaultdict(int)
dict1 = defaultdict(dd) # dd is a module-level function
than you can pickle it
tmp = pickle.dumps(dict1) # no exception
new = pickle.loads(tmp)
How to pickle a defaultdict which uses a lambda function?
A simple workaround would be to implement your tree data-structure differently, without defaultdict
:
class DTree(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
try: import cPickle as pickle
except: import pickle
#Create dtree object:
hapPkl = DTree()
#Create Pickle file
with open("hapP.pkl", "wb") as f:
pickle.dump(hapPkl, f)
pickling defaultdict with lambda
You can do it with dill
. You've made a typo… you should have used dill.dump
instead of dill.dumps
if you want to dump
to a file. If you want to dump
to a string, use dumps
.
>>> import dill
>>> from collections import defaultdict
>>> pos = defaultdict(lambda: 0)
>>> neg = defaultdict(lambda: 0)
>>> countdata = (pos,neg)
>>> _countdata = dill.loads(dill.dumps(countdata))
>>> _countdata
(defaultdict(<function <lambda> at 0x10917f7d0>, {}), defaultdict(<function <lambda> at 0x10917f8c0>, {}))
>>>
>>> # now dump countdata to a file
>>> with open('data.pkl', 'wb') as f:
... dill.dump(countdata, f)
...
>>>
Python: how does Pickle work with defaultdict
First of all, if you look at the pickle docs, specifically:
pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored
So what this is telling us is that pickle will import the module that defines the object you are unpickling.
We can show this with a small example, consider the following folder structure:
parent/
|-- a.py
|-- sub/
sub
is an empty sub-foldera.py
holds an example class
# a.py
class ExampleClass:
def __init__(self):
self.var = 'This is a string'
Now starting the python
console in the parent
directory:
alex@toaster:parent$ python3
>>> import pickle
>>> from a import ExampleClass
>>> x = ExampleClass()
>>> x.var
'This is a string'
>>> with open('eg.p', 'wb') as f:
... pickle.dump(x, f)
Exit the shell. Move to the sub
directory and try to load the pickled ExampleClass
object.
alex@toaster:sub$ python3
>>> import pickle
>>> with open('../eg.p', 'rb') as f:
... x = pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'a'
We get a ModuleNotFoundError
as pickle cannot load the class definition from the module a
(it's in a different directory). In your case, python can load the collections.defaultdict
class as this module is on the PYTHONPATH
. However, to continue to use the module(s) imported by pickle you will still need to import them yourself; eg you want to create another defaultdict
in script2.py
.
To find out more about modules look here, specifically 6.1.2 The Module Search Path.
Can't pickle recursive nested defaultdict
The defaultdict
class implements a object.__reduce__()
method where the second element of the returned tuple (the arguments for the constructor) is always going to be the factory object:
>>> d = NestedDict()
>>> d.__reduce__()
(<class '__main__.NestedDict'>, (<class '__main__.NestedDict'>,), None, None, <dict_itemiterator object at 0x110df59a8>)
That argument is then passed to the NestedDict()
call to re-build the object. The exception is thrown because the NestedDict
class doesn’t accept an argument.
You can override the __reduce__
method in your subclass:
class NestedDict(defaultdict):
def __init__(self):
super().__init__(self.__class__)
def __reduce__(self):
return (type(self), (), None, None, iter(self.items()))
The above produces the exact same elements defaultdict.__reduce__()
returns, except that the second element is now an empty tuple.
You could also just accept and ignore a single argument:
class NestedDict(defaultdict):
def __init__(self, _=None): # accept a factory and ignore it
super().__init__(self.__class__)
The _
name is commonly used to mean I am ignoring this value.
An alternative implementation could just subclass dict
and provide a custom __missing__
method; this method is called for keys not in the dictionary:
class NestedDict(dict):
def __missing__(self, key):
nested = self[key] = type(self)()
return nested
def __repr__(self):
return f'{type(self).__name__}({super().__repr__()})'
This works exactly like your version, but doesn't need additional pickle support methods:
>>> d = NestedDict()
>>> d['foo']
NestedDict({})
>>> d['foo']['bar']
NestedDict({})
>>> d
NestedDict({'foo': NestedDict({'bar': NestedDict({})})})
>>> pickle.loads(pickle.dumps(d))
NestedDict({'foo': NestedDict({'bar': NestedDict({})})})
Pickling a dictionary that uses defaultdict
Unfortunately, that answer there is correct for that question, but subtly wrong for yours. Although a top-level function instead of a lambda is great and indeed would make pickle a lot happier, the function should return the default value to be used, which for your case is not another defaultdict
object.
Simply return the same value your lambda
returns:
def dd():
return 1
Every time you try to access a key in the defaultdict
instance that doesn't yet exist, dd
is called. The other post then returns another defaultdict
instance, that one set to use int
as a default, which matches the lambda shown in the other question.
AttributeError: Can't pickle local object 'locals.lambda'
pickle
records references to functions (module and function name), not the functions themselves. When unpickling, it will load the module and get the function by name. lambda
creates anonymous function objects that don't have names and can't be found by the loader. The solution is to switch to a named function.
def create_int_defaultdict():
return collections.defaultdict(int)
class A:
def funA(self):
#create a dictionary and fill with values
dictionary = collections.defaultdict(create_int_defaultdict)
...
#then pickle to save it
pickle.dump(dictionary, f)
Related Topics
Fitting a Normal Distribution to 1D Data
How to Get Exception Message in Python Properly
What Does "Error: Option --Single-Version-Externally-Managed Not Recognized" Indicate
How to Print a List with Integers Without the Brackets, Commas and No Quotes
Handling Urllib2's Timeout? - Python
Reading the Target of a .Lnk File in Python
Transform "List of Tuples" into a Flat List or a Matrix
Add Zeros to a Float After the Decimal Point in Python
How to Get Value from Form Field in Django Framework
Zip with List Output Instead of Tuple
How to Return a Subset of a List That Matches a Condition
Tkinter Grid_Forget Is Clearing the Frame
Passing Numpy Arrays to a C Function for Input and Output
Axes Class - Set Explicitly Size (Width/Height) of Axes in Given Units