2

I'm struggling with this problem: I've 2 large 2D numpy arrays (about 5 GB) and I want to save them in a .mat file loadable from Matlab I tried scipy.io and wrote

from scipy.io import savemat

data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
        long_field_names=False, do_compression=False, oned_as='row')

but I get the error: OverflowError: Python int too large to convert to C long

EDIT: Python 3.8, Matlab 2017b

Here the traceback

a.shape (600,1048261) of type <class 'numpy.float64'>

b.shape (1048261) of type <class 'numpy.float64'>

data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
        long_field_names=False, do_compression=False, oned_as='row')
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-19-4d1d08a54148> in <module>
      1 data = {'A': a, 'B': b}
----> 2 savemat('myfile.mat', data, appendmat=True, format='5',
      3         long_field_names=False, do_compression=False, oned_as='row')

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio.py in savemat(file_name, mdict, appendmat, format, long_field_names, do_compression, oned_as)
    277         else:
    278             raise ValueError("Format should be '4' or '5'")
--> 279         MW.put_variables(mdict)
    280 
    281 

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in put_variables(self, mdict, write_header)
    847                 self.file_stream.write(out_str)
    848             else:  # not compressing
--> 849                 self._matrix_writer.write_top(var, asbytes(name), is_global)

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_top(self, arr, name, is_global)
    588         self._var_name = name
    589         # write the header and data
--> 590         self.write(arr)
    591 
    592     def write(self, arr):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write(self, arr)
    627             self.write_char(narr, codec)
    628         else:
--> 629             self.write_numeric(narr)
    630         self.update_matrix_tag(mat_tag_pos)
    631 

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_numeric(self, arr)
    653             self.write_element(arr.imag)
    654         else:
--> 655             self.write_element(arr)
    656 
    657     def write_char(self, arr, codec='ascii'):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_element(self, arr, mdtype)
    494             self.write_smalldata_element(arr, mdtype, byte_count)
    495         else:
--> 496             self.write_regular_element(arr, mdtype, byte_count)
    497 
    498     def write_smalldata_element(self, arr, mdtype, byte_count):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_regular_element(self, arr, mdtype, byte_count)
    508         tag = np.zeros((), NDT_TAG_FULL)
    509         tag['mdtype'] = mdtype
--> 510         tag['byte_count'] = byte_count
    511         self.write_bytes(tag)
    512         self.write_bytes(arr)

OverflowError: Python int too large to convert to C long

I tried also with hdf5storage

 hdf5storage.write(data, 'myfile.mat', matlab_compatible=True)

but it fails too.

EDIT:

gives this warning

\miniconda3\envs\work\lib\site-packages\hdf5storage\__init__.py:1306: 
 H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) 
 in h5py 3.0. To suppress this warning, pass the mode you need to 
 h5py.File(), or set the global default h5.get_config().default_file_mode, or 
 set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 
 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  f = h5py.File(filename)

Anyway, it creates a 5GB file but when I load it in Matlab I get a variable named with the file path and apparently without data.

Lastly I tried with h5py:

 import h5py

 hf = h5py.File('C:/Users/flavio/Desktop/STRA-pattern.mat', 'w')

 hf.create_dataset('A', data=a)
 hf.create_dataset('B', data=b)

 hf.close()

but the output file in not recognized/readable in Matlab.

Is splitting the only solution? Hope there is a better way to fix this issue.

5
  • You may find suitable answer in a similar question. Link to similar question Commented Aug 18, 2020 at 22:10
  • 1
    Tell us about a and b - inparticular shape and dtype. If object dtype, tell us about the elements. You may also need to show the full traceback. An error like that comes from inside the savemat function. That "but it fails too" error description is just plain bad manners. If you want help, give us full information. Commented Aug 18, 2020 at 22:32
  • Did you try saving just one array, or even part? Without the tracwback it's hard say if the problem is with the size, or with values. Commented Aug 18, 2020 at 23:36
  • It should be a problem of size, cause if I try saving part of data it works fine Commented Aug 19, 2020 at 12:13
  • Maybe the easiest thing for simple arrays is to save them to npy-format and use github.com/kwikteam/npy-matlab/tree/master/npy-matlab for reading the files in matlab. Commented Aug 20, 2020 at 8:34

1 Answer 1

2

Anyone still looking for an answer, this works with hdf5storage

hdf5storage.savemat( save_path, data_dict, format=7.3, matlab_compatible=True, compress=False )

Sign up to request clarification or add additional context in comments.

1 Comment

Which parameter(s) passed here make the difference? OP also tried hdf5storage and said that it failed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.