Read .mat files in Python
An import is required, import scipy.io
...
import scipy.io
mat = scipy.io.loadmat('file.mat')
Read a matlab .mat file using h5py
While h5py
can read h5
files from MATLAB, figuring out what is there takes some exploring - looking at keys
groups
and datasets
(and possibly attr). There's nothing in scipy
that will help you (scipy.io.loadmat
is for the old MATLAB mat format).
With the downloaded file:
In [61]: f = h5py.File('Downloads/Basketball_ECO_HC.mat','r')
In [62]: f
Out[62]: <HDF5 file "Basketball_ECO_HC.mat" (mode r)>
In [63]: f.keys()
Out[63]: <KeysViewHDF5 ['#refs#', 'results']>
In [65]: f['results']
Out[65]: <HDF5 dataset "results": shape (1, 1), type "|O">
In [66]: arr = f['results'][:]
In [67]: arr
Out[67]: array([[<HDF5 object reference>]], dtype=object)
In [68]: arr.item()
Out[68]: <HDF5 object reference>
I'd have to check the h5py
docs to see if I can check that object reference further. I'm not familiar with it.
But exploring the other key
:
In [69]: list(f.keys())[0]
Out[69]: '#refs#'
In [70]: f[list(f.keys())[0]]
Out[70]: <HDF5 group "/#refs#" (2 members)>
In [71]: f[list(f.keys())[0]].keys()
Out[71]: <KeysViewHDF5 ['a', 'b']>
In [72]: f[list(f.keys())[0]]['a']
Out[72]: <HDF5 dataset "a": shape (2,), type "<u8">
In [73]: _[:]
Out[73]: array([0, 0], dtype=uint64)
In [74]: f[list(f.keys())[0]]['b']
Out[74]: <HDF5 group "/#refs#/b" (7 members)>
In [75]: f[list(f.keys())[0]]['b'].keys()
Out[75]: <KeysViewHDF5 ['annoBegin', 'fps', 'fps_no_ftr', 'len', 'res', 'startFrame', 'type']>
In [76]: f[list(f.keys())[0]]['b']['fps']
Out[76]: <HDF5 dataset "fps": shape (1, 1), type "<f8">
In [77]: f[list(f.keys())[0]]['b']['fps'][:]
Out[77]: array([[22.36617883]])
In the OS shell , I can look at the file with h5dump
. From that it looks like the res
dataset has the most data. The datasets also have attributes. That may be a better way of getting an overview, and use that to guide the h5py
loads.
In [80]: f[list(f.keys())[0]]['b']['res'][:]
Out[80]:
array([[198., 196., 195., ..., 330., 328., 326.],
[214., 214., 216., ..., 197., 196., 192.],
[ 34., 34., 34., ..., 34., 34., 34.],
[ 81., 81., 81., ..., 81., 80., 80.]])
In [81]: f[list(f.keys())[0]]['b']['res'][:].shape
Out[81]: (4, 725)
In [82]: f[list(f.keys())[0]]['b']['res'][:].dtype
Out[82]: dtype('<f8')
Python: Issue reading in str from MATLAB .mat file using h5py and NumPy
If you don't mind the variable type of etime
and stime
stored in file.mat
and you can store them as type char
instead of string
, you could read them in Python by: bytes(f.get(your_variable).value).decode('utf-8')
. In your case:
data = {
"average": np.array(f.get('average')),
"median": np.array(f.get('median')),
"stdev": np.array(f.get('stdev')),
"P10": np.array(f.get('p10')),
"P90": np.array(f.get('p90')),
"St": bytes(f.get('stime')[:]).decode('utf-8'),
"Et": bytes(f.get('etime')[:]).decode('utf-8')
}
I'm sure there is also a way of reading string
type, but this might be the simplest solution.
Related Topics
How to Sort a List of Lists by a Specific Index of the Inner List
How to Properly Setup Pipenv in Pycharm
Python - Get Last Element After Str.Split()
Importerror: No Module Named Sklearn (Python)
Using Buttons in Tkinter to Navigate to Different Pages of the Application
Python - How to Check If Table Exists
Pickle - Cpickle.Unpicklingerror: Invalid Load Key, '?'
How to Add Pandas Data to an Existing CSV File
Typeerror: Unsupported Operand Type(S) for ** or Pow(): 'List' and 'Int'
Python - Split a List of Dicts into Individual Dicts
Importerror: No Module Named Psycopg2 After Install
Sqlalchemy: How to Filter Date Field
Numpy: Checking If a Value Is Nat
Split Datetime Column into a Date and Time Python
Cast String to Float Is Not Supported in Linear Model
Find Out the Percentage of Missing Values in Each Column in the Given Dataset