0

I have few MATLAB variables of following datafields which I saved in test.mat file using -v7.3 flag and want to read back using h5py module for other purpose.

load('test.mat'); % give me this 

struct with fields:
       volume: [4240×1 double]
     centroid: [4240×3 double]
        faces: {4240×1 cell}
          nuc: {4240×1 cell}

I can read the double field variable but unable to access the cell fields variable. Is therey any way I can access nuc and faces variable from python?

>>>import h5py
>>>name='test.mat' 
>>>f=h5py.File(name)
>>>f.keys()
<KeysViewHDF5 ['#refs#', 'volume', 'centroid', 'faces', 'nuc']>
>>>o1=f['centroid'] 
<HDF5 dataset "centroid": shape (3, 4240), type "<f8">
>>>o1[:,0]
array([ -387.82973928,   533.54789111, -7359.64917621])
>>>o3=f['nuc']
<HDF5 dataset "nuc": shape (1, 4240), type "|O">
>>>type(o3)
<class 'h5py._hl.dataset.Dataset'>
>>>type(o3[0])
<class 'numpy.ndarray'>
>>>type(o3[0][0])
<class 'h5py.h5r.Reference'>
>>>o3[0][0]
<HDF5 object reference>
>>>o3[0]
array([<HDF5 object reference>, <HDF5 object reference>,
   <HDF5 object reference>, ..., <HDF5 object reference>,
   <HDF5 object reference>, <HDF5 object reference>], dtype=object)

I tried all the option but I cannot see the numerical values of nuc variable. Any suggestion will be appreciated.

Thanks for the comment everyone. Following command is working now.

 >>> f[f['nuc'][0][0]][:]
   array([[ -733.94435313,  -733.66995189,  -734.09632262, ...,
     -733.66832197,  -733.81233202,  -733.54615564],
   [  247.76823184,   247.49908481,   248.17514583, ...,
      240.16088783,   240.56909865,   240.84810507],
   [-7485.86866961, -7485.92114207, -7485.93468626, ...,
    -7508.16909395, -7508.16306386, -7508.20712349]])
 >>> f[f['nuc'][0][0]][:].shape
    (3, 1512)
 >>> f[f['nuc'][0][1]][:].shape
    (3, 1491)
 >>> f[f['nuc'][0][2]][:].shape
    (3, 1556)
3
  • Not exactly the answer you were looking for, but I use the : scipy.io module, or specfically, scipy.io.loadmat('my_mat.mat') file. Returns a dictionary with the values. Commented Jan 9, 2020 at 8:22
  • I know about this module but it works if size is less than 2GB. In my case the data in cell variables are big matrix. That makes size more than 2 GB. Commented Jan 9, 2020 at 8:28
  • io.loadmat creates object dtype arrays from MATLAB cells. But h5py doesn't handle object dtype arrays. So the mapping from MATLAB references to HDF5 ones to numpy ones becomes more complex. Commented Jan 9, 2020 at 17:59

1 Answer 1

2

.mat file saved using -v7.3 flag (HDF5 format) uses a complex data schema that uses "object references". Object references are not the data, but a pointer to the data (in a different location). You use the object reference to get to the data (in your example, the nuc values). You can get data for the first element of nuc like this:
arr = f[ f['nuc'][0][0] ][:], or arr = f[ o3[0][0] ][:]
(You can also use comma delimiters if you prefer: f[ f['nuc'][0,0] ][:] )

Deconstructing the expression above:
f['nuc'] --> is a field (column) of data
f['nuc'][0] --> is the first element in the column (an array of object references)
f['nuc'][0][0] --> is the first object reference in the array
f[ f['nuc'][0][0] ][:] --> dereferences the object reference and reads the data, ie reads the array
Alternately, you can do this (method I prefer for readability):
obj_ref = f['nuc'][0][0] --> returns the first object reference
f[obj_ref][:] --> dereferences the object reference and reads the array data

This SO Q&A gives a basic explanation on reading .mat files:
read-matlab-v7-3-file-into-python-list-of-numpy-arrays-via-h5py

I wrote a more complete explanation (for reading SVHN datasets). You can access it here:
what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot. Following command is working in my case. f[f['nuc'][0][0]][:]
Ankit, nice catch. As an object reference, f['nuc'][0][0] defines a path in the file to a group or dataset -- similar to ['group_name/dataset_name']. I updated my post with the correction.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.