Read .Mat Files in Python

Read .mat files in Python

An import is required, import scipy.io...

import scipy.io
mat = scipy.io.loadmat('file.mat')

Read a matlab .mat file using h5py

While h5py can read h5 files from MATLAB, figuring out what is there takes some exploring - looking at keys groups and datasets (and possibly attr). There's nothing in scipy that will help you (scipy.io.loadmat is for the old MATLAB mat format).

With the downloaded file:

In [61]: f = h5py.File('Downloads/Basketball_ECO_HC.mat','r')
In [62]: f
Out[62]: <HDF5 file "Basketball_ECO_HC.mat" (mode r)>
In [63]: f.keys()
Out[63]: <KeysViewHDF5 ['#refs#', 'results']>
In [65]: f['results']
Out[65]: <HDF5 dataset "results": shape (1, 1), type "|O">
In [66]: arr = f['results'][:]
In [67]: arr
Out[67]: array([[<HDF5 object reference>]], dtype=object)
In [68]: arr.item()
Out[68]: <HDF5 object reference>

I'd have to check the h5py docs to see if I can check that object reference further. I'm not familiar with it.

But exploring the other key:

In [69]: list(f.keys())[0]
Out[69]: '#refs#'
In [70]: f[list(f.keys())[0]]
Out[70]: <HDF5 group "/#refs#" (2 members)>
In [71]: f[list(f.keys())[0]].keys()
Out[71]: <KeysViewHDF5 ['a', 'b']>
In [72]: f[list(f.keys())[0]]['a']
Out[72]: <HDF5 dataset "a": shape (2,), type "<u8">
In [73]: _[:]
Out[73]: array([0, 0], dtype=uint64)
In [74]: f[list(f.keys())[0]]['b']
Out[74]: <HDF5 group "/#refs#/b" (7 members)>
In [75]: f[list(f.keys())[0]]['b'].keys()
Out[75]: <KeysViewHDF5 ['annoBegin', 'fps', 'fps_no_ftr', 'len', 'res', 'startFrame', 'type']>
In [76]: f[list(f.keys())[0]]['b']['fps']
Out[76]: <HDF5 dataset "fps": shape (1, 1), type "<f8">
In [77]: f[list(f.keys())[0]]['b']['fps'][:]
Out[77]: array([[22.36617883]])

In the OS shell , I can look at the file with h5dump. From that it looks like the res dataset has the most data. The datasets also have attributes. That may be a better way of getting an overview, and use that to guide the h5py loads.

In [80]: f[list(f.keys())[0]]['b']['res'][:]
Out[80]:
array([[198., 196., 195., ..., 330., 328., 326.],
[214., 214., 216., ..., 197., 196., 192.],
[ 34., 34., 34., ..., 34., 34., 34.],
[ 81., 81., 81., ..., 81., 80., 80.]])
In [81]: f[list(f.keys())[0]]['b']['res'][:].shape
Out[81]: (4, 725)
In [82]: f[list(f.keys())[0]]['b']['res'][:].dtype
Out[82]: dtype('<f8')

Python: Issue reading in str from MATLAB .mat file using h5py and NumPy

If you don't mind the variable type of etime and stime stored in file.mat and you can store them as type char instead of string, you could read them in Python by: bytes(f.get(your_variable).value).decode('utf-8'). In your case:

data = {
"average": np.array(f.get('average')),
"median": np.array(f.get('median')),
"stdev": np.array(f.get('stdev')),
"P10": np.array(f.get('p10')),
"P90": np.array(f.get('p90')),
"St": bytes(f.get('stime')[:]).decode('utf-8'),
"Et": bytes(f.get('etime')[:]).decode('utf-8')
}

I'm sure there is also a way of reading string type, but this might be the simplest solution.



Related Topics



Leave a reply



Submit