Saving and loading multiple objects in pickle file?
Using a list, tuple, or dict is by far the most common way to do this:
import pickle
PIK = "pickle.dat"
data = ["A", "b", "C", "d"]
with open(PIK, "wb") as f:
pickle.dump(data, f)
with open(PIK, "rb") as f:
print pickle.load(f)
That prints:
['A', 'b', 'C', 'd']
However, a pickle file can contain any number of pickles. Here's code producing the same output. But note that it's harder to write and to understand:
with open(PIK, "wb") as f:
pickle.dump(len(data), f)
for value in data:
pickle.dump(value, f)
data2 = []
with open(PIK, "rb") as f:
for _ in range(pickle.load(f)):
data2.append(pickle.load(f))
print data2
If you do this, you're responsible for knowing how many pickles are in the file you write out. The code above does that by pickling the number of list objects first.
How can you Pickle multiple objects in one file?
pickle
will read them in the same order you dumped them in.
import pickle
test1, test2 = ["One", "Two", "Three"], ["1", "2", "3"]
with open("C:/temp/test.pickle","wb") as f:
pickle.dump(test1, f)
pickle.dump(test2, f)
with open("C:/temp/test.pickle", "rb") as f:
testout1 = pickle.load(f)
testout2 = pickle.load(f)
print testout1, testout2
Prints out ['One', 'Two', 'Three'] ['1', '2', '3']
. To pickle an arbitrary number of objects, or to just make them easier to work with, you can put them in a tuple, and then you only have to pickle the one object.
import pickle
test1, test2 = ["One", "Two", "Three"], ["1", "2", "3"]
saveObject = (test1, test2)
with open("C:/temp/test.pickle","wb") as f:
pickle.dump(saveObject, f)
with open("C:/temp/test.pickle", "rb") as f:
testout = pickle.load(f)
print testout[0], testout[1]
Prints out ['One', 'Two', 'Three'] ['1', '2', '3']
Saving python lists data as 1 pickle and then loading them back
To save your multiple lists e.g. 3 in your case in a single pickle file you must put all of your list in a dictionary and then save the single dictionary. After loading the dictionary back load your desired list back.
import pickle
def SaveLists(data):
open_file = open('myPickleFile' "wb")
pickle.dump(data, open_file)
open_file.close()
def LoadLists(file):
open_file = open(file, "rb")
loaded_list = pickle.load(open_file)
open_file.close()
return loaded_list
#example to call the functions
cars = ['Toyota', 'Honda']
fruits = ['Apple', 'Cherry']
#create dictionary and add these lists
data = {}
data['cars'] = cars
data['fruits'] = fruits #add upto any number of lists
#save the data in pickle form
SaveLists(data)
#Load the data when desired
lists = LoadLists('myPickleFile')
print(lists['fruits']) #get your desired list
Multiple objects in pickle file
You are opening your pickle file with mode wb
, which truncates (set's the file back to empty before writing anything new). The mode you want is a
which opens for appending.
Load all pickled objects
I am not sure if this i the correct approach.
import pickle
ListNames = [["Name1","City1","Email1"],["Name2","City2","Number2"]]
ListNumbers = [1,2,3,4,5,6,7,8]
with open ("TestPickle.pickle","wb") as fileSaver:
pickle.dump(ListNames,fileSaver)
pickle.dump(ListNumbers,fileSaver)
obj = []
with open("TestPickle.pickle","rb") as fileOpener:
while True:
try:
obj.append(pickle.load(fileOpener))
except EOFError:
break
print obj
Output:
[[['Name1', 'City1', 'Email1'], ['Name2', 'City2', 'Number2']], [1, 2, 3, 4, 5, 6, 7, 8]]
How to load all objects of pickle in python?
Your answer indicates that you are using multiple invocations of dump
to dump multiple objects into the same stream. If that is the case, it is expected that you know how many times to call load
, e.g. by obtaining the information from the first loaded object, or by the number of objects being constant.
If that is not the case, use a single dump
to dump all the objects by packing them up in a tuple:
pickle.dump((a, b, c), f)
You then will be able to load them in one go:
a, b, c = pickle.load(f)
If you cannot change the dumping code to use a tuple, you can simply load the data from the stream until encountering an EOFError
:
objs = []
while True:
try:
o = pickle.load(f)
except EOFError:
break
objs.append(o)
Speed up reading multiple pickle files
I think that the solution would be some library written in C that
takes a list of files to read and then runs multiple threads (without
GIL). Is there something like this around?
In short: no. pickle
is apparently good enough for enough people that there are no major alternate implementations fully compatible with the pickle protocol. As of sometime in python 3, cPickle
was merged with pickle
, and neither release the GIL anyway which is why threading won't help you (search for Py_BEGIN_ALLOW_THREADS
in _pickle.c and you will find nothing).
If your data can be re-structured into a simpler data format like csv, or a binary format like numpy
's npy, there will be less cpu overhead when reading your data. Pickle is built for flexibility first rather than speed or compactness first. One possible exception to the rule of more complex less speed is the HDF5 format using h5py
, which can be fairly complex, and I have used to max out the bandwidth of a sata ssd.
Finally you mention you have many many pickle files, and that itself is probably causing no small amount of overhead. Each time you open a new file, there's some overhead involved from the operating system. Conveniently you can combine pickle files by simply appending them together. Then you can call Unpickler.load()
until you reach the end of the file. Here's a quick example of combining two pickle files together using shutil
import pickle, shutil, os
#some dummy data
d1 = {'a': 1, 'b': 2, 1: 'a', 2: 'b'}
d2 = {'c': 3, 'd': 4, 3: 'c', 4: 'd'}
#create two pickles
with open('test1.pickle', 'wb') as f:
pickle.Pickler(f).dump(d1)
with open('test2.pickle', 'wb') as f:
pickle.Pickler(f).dump(d2)
#combine list of pickle files
with open('test3.pickle', 'wb') as dst:
for pickle_file in ['test1.pickle', 'test2.pickle']:
with open(pickle_file, 'rb') as src:
shutil.copyfileobj(src, dst)
#unpack the data
with open('test3.pickle', 'rb') as f:
p = pickle.Unpickler(f)
while True:
try:
print(p.load())
except EOFError:
break
#cleanup
os.remove('test1.pickle')
os.remove('test2.pickle')
os.remove('test3.pickle')
Related Topics
Pandas: Drop Consecutive Duplicates
Why Python 3.6.1 Throws Attributeerror: Module 'Enum' Has No Attribute 'Intflag'
Is It Pythonic: Naming Lambdas
Pandas Read_Csv: Low_Memory and Dtype Options
Python List VS. Array - When to Use
How to Create a Guid/Uuid in Python
How to Convert SQLalchemy Row Object to a Python Dict
How to Convert a Utc Datetime to a Local Datetime Using Only Standard Library
How to Delete Items from a Dictionary While Iterating Over It
What Is the Standard Way to Add N Seconds to Datetime.Time in Python
Unicodedecodeerror: 'Utf8' Codec Can't Decode Byte 0Xa5 in Position 0: Invalid Start Byte
How to Save a New Sheet in an Existing Excel File, Using Pandas
Python Re.Sub Group: Number After \Number
Pythonic Way to Check If a List Is Sorted or Not
Taking Multiple Integers on the Same Line as Input from the User in Python