Saving and Loading Multiple Objects in Pickle File

Saving and loading multiple objects in pickle file?

Using a list, tuple, or dict is by far the most common way to do this:

import pickle
PIK = "pickle.dat"

data = ["A", "b", "C", "d"]
with open(PIK, "wb") as f:
pickle.dump(data, f)
with open(PIK, "rb") as f:
print pickle.load(f)

That prints:

['A', 'b', 'C', 'd']

However, a pickle file can contain any number of pickles. Here's code producing the same output. But note that it's harder to write and to understand:

with open(PIK, "wb") as f:
pickle.dump(len(data), f)
for value in data:
pickle.dump(value, f)
data2 = []
with open(PIK, "rb") as f:
for _ in range(pickle.load(f)):
data2.append(pickle.load(f))
print data2

If you do this, you're responsible for knowing how many pickles are in the file you write out. The code above does that by pickling the number of list objects first.

How can you Pickle multiple objects in one file?

pickle will read them in the same order you dumped them in.

import pickle

test1, test2 = ["One", "Two", "Three"], ["1", "2", "3"]
with open("C:/temp/test.pickle","wb") as f:
pickle.dump(test1, f)
pickle.dump(test2, f)
with open("C:/temp/test.pickle", "rb") as f:
testout1 = pickle.load(f)
testout2 = pickle.load(f)

print testout1, testout2

Prints out ['One', 'Two', 'Three'] ['1', '2', '3']. To pickle an arbitrary number of objects, or to just make them easier to work with, you can put them in a tuple, and then you only have to pickle the one object.

import pickle

test1, test2 = ["One", "Two", "Three"], ["1", "2", "3"]
saveObject = (test1, test2)
with open("C:/temp/test.pickle","wb") as f:
pickle.dump(saveObject, f)
with open("C:/temp/test.pickle", "rb") as f:
testout = pickle.load(f)

print testout[0], testout[1]

Prints out ['One', 'Two', 'Three'] ['1', '2', '3']

Saving python lists data as 1 pickle and then loading them back

To save your multiple lists e.g. 3 in your case in a single pickle file you must put all of your list in a dictionary and then save the single dictionary. After loading the dictionary back load your desired list back.

import pickle

def SaveLists(data):
open_file = open('myPickleFile' "wb")
pickle.dump(data, open_file)
open_file.close()

def LoadLists(file):
open_file = open(file, "rb")
loaded_list = pickle.load(open_file)
open_file.close()
return loaded_list

#example to call the functions
cars = ['Toyota', 'Honda']
fruits = ['Apple', 'Cherry']

#create dictionary and add these lists
data = {}
data['cars'] = cars
data['fruits'] = fruits #add upto any number of lists

#save the data in pickle form
SaveLists(data)

#Load the data when desired
lists = LoadLists('myPickleFile')
print(lists['fruits']) #get your desired list

Multiple objects in pickle file

You are opening your pickle file with mode wb, which truncates (set's the file back to empty before writing anything new). The mode you want is a which opens for appending.

Load all pickled objects

I am not sure if this i the correct approach.

import pickle

ListNames = [["Name1","City1","Email1"],["Name2","City2","Number2"]]
ListNumbers = [1,2,3,4,5,6,7,8]

with open ("TestPickle.pickle","wb") as fileSaver:
pickle.dump(ListNames,fileSaver)
pickle.dump(ListNumbers,fileSaver)
obj = []
with open("TestPickle.pickle","rb") as fileOpener:
while True:
try:
obj.append(pickle.load(fileOpener))
except EOFError:
break
print obj

Output:

[[['Name1', 'City1', 'Email1'], ['Name2', 'City2', 'Number2']], [1, 2, 3, 4, 5, 6, 7, 8]]

How to load all objects of pickle in python?

Your answer indicates that you are using multiple invocations of dump to dump multiple objects into the same stream. If that is the case, it is expected that you know how many times to call load, e.g. by obtaining the information from the first loaded object, or by the number of objects being constant.

If that is not the case, use a single dump to dump all the objects by packing them up in a tuple:

pickle.dump((a, b, c), f)

You then will be able to load them in one go:

a, b, c = pickle.load(f)

If you cannot change the dumping code to use a tuple, you can simply load the data from the stream until encountering an EOFError:

objs = []
while True:
try:
o = pickle.load(f)
except EOFError:
break
objs.append(o)

Speed up reading multiple pickle files

I think that the solution would be some library written in C that
takes a list of files to read and then runs multiple threads (without
GIL). Is there something like this around?

In short: no. pickle is apparently good enough for enough people that there are no major alternate implementations fully compatible with the pickle protocol. As of sometime in python 3, cPickle was merged with pickle, and neither release the GIL anyway which is why threading won't help you (search for Py_BEGIN_ALLOW_THREADS in _pickle.c and you will find nothing).

If your data can be re-structured into a simpler data format like csv, or a binary format like numpy's npy, there will be less cpu overhead when reading your data. Pickle is built for flexibility first rather than speed or compactness first. One possible exception to the rule of more complex less speed is the HDF5 format using h5py, which can be fairly complex, and I have used to max out the bandwidth of a sata ssd.

Finally you mention you have many many pickle files, and that itself is probably causing no small amount of overhead. Each time you open a new file, there's some overhead involved from the operating system. Conveniently you can combine pickle files by simply appending them together. Then you can call Unpickler.load() until you reach the end of the file. Here's a quick example of combining two pickle files together using shutil

import pickle, shutil, os

#some dummy data
d1 = {'a': 1, 'b': 2, 1: 'a', 2: 'b'}
d2 = {'c': 3, 'd': 4, 3: 'c', 4: 'd'}

#create two pickles
with open('test1.pickle', 'wb') as f:
pickle.Pickler(f).dump(d1)
with open('test2.pickle', 'wb') as f:
pickle.Pickler(f).dump(d2)

#combine list of pickle files
with open('test3.pickle', 'wb') as dst:
for pickle_file in ['test1.pickle', 'test2.pickle']:
with open(pickle_file, 'rb') as src:
shutil.copyfileobj(src, dst)

#unpack the data
with open('test3.pickle', 'rb') as f:
p = pickle.Unpickler(f)
while True:
try:
print(p.load())
except EOFError:
break

#cleanup
os.remove('test1.pickle')
os.remove('test2.pickle')
os.remove('test3.pickle')


Related Topics



Leave a reply



Submit