cPickle.UnpicklingError: invalid load key, ' '.?
The argument you've passed to cPickle.load() has to be a .pkl file.
mnist.pkl is provided inside of mnist.pkl.gz
So, you have to open that .gz first. Try this:
import gzip
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
cPickle: UnpicklingError: invalid load key, 'A'
When storing multiple objects (by repeated dump
, not from containers) Pickle will store objects sequentially in pickle files, so if an object is broken it can be removed without corrupting the others.
In principle, the pickle format is pseudo-documented in pickle.py
. For most cases, the opcodes at the beginning of the module are sufficient to piece together what is happening. Basically, pickle files are an instruction on how to build objects.
How readable a pickle file is depends on its pickle format - 0 is doable, everything above is difficult. Whether you can fix or must delete depends entirely on this. What's consistent is that each individual pickle ends with a dot (.
). For example, b'Va\np0\n.'
and b'\x80\x04\x95\x05\x00\x00\x00\x00\x00\x00\x00\x8c\x01a\x94.'
both are the character '"a"', but in protocol 0 and 4.
The simplest form of recovery is to count the number of objects you can load:
with open('/my/pickle.pkl', 'rb') as pkl_source:
idx = 1
while True:
pickle.load(pkl_source)
print(idx)
idx += 1
Then open the pickle file, skip as many objects and remove everything up to the next .
.
What causes the error _pickle.UnpicklingError: invalid load key, ' '. ?
pickling is recursive, not sequential. Thus, to pickle a list, pickle
will start to pickle the containing list, then pickle the first element… diving into the first element and pickling dependencies and sub-elements until the first element is serialized. Then moves on to the next element of the list, and so on, until it finally finishes the list and finishes serializing the enclosing list. In short, it's hard to treat a recursive pickle as sequential, except for some special cases. It's better to use a smarter pattern on your dump
, if you want to load
in a special way.
The most common pickle, it to pickle everything with a single dump
to a file -- but then you have to load
everything at once with a single load
. However, if you open a file handle and do multiple dump
calls (e.g. one for each element of the list, or a tuple of selected elements), then your load
will mirror that… you open the file handle and do multiple load
calls until you have all the list elements and can reconstruct the list. It's still not easy to selectively load
only certain list elements, however. To do that, you'd probably have to store your list elements as a dict
(with the index of the element or chunk as the key) using a package like klepto
, which can break up a pickled dict
into several files transparently, and enables easy loading of specific elements.
Saving and loading multiple objects in pickle file?
Related Topics
Regex to Match Digits and At Most One Space Between Them
Python - Get Last Element After Str.Split()
Pandas: Sum Dataframe Rows for Given Columns
Convert a Standard Python Key Value Dictionary List to Pyspark Data Frame
How to Correct Typeerror: Unicode-Objects Must Be Encoded Before Hashing
Python Data Frame How to Find the Local Maximum in a 2D Array
How to Select Percentage of Rows in Pandas Dataframe
Using Tkinter in Python to Edit the Title Bar
Pick Dictionary Keys:Values Randomly
Arrange a Text File Using Python
Setting Matplotlib Colorbar Range
Python Pandas Read_Excel() Module Not Found
Python: How to Calculate the Average Word Length in a Sentence Using the .Split Command
How to Get Rid of the B-Prefix in a String in Python
Regular Expression for Double and Integer Validation
How Does the Code Prints 1 2 6 24 as Output and Not 24 6 2 1
Numpy Import Throws Attributeerror: 'Module' Object Has No Attribute 'Core'