Python - efficient way to save an array with multiple labels

Question

I am currently saving some data from a process.

np.save('stochastic_data',(rho_av,rho_std))

where rho_av and rho_std are arrays.

However, this data depends on some parameters, say E, k, and M. For each of them, I get different data. But I am only saving the data for a given set of parameters, i.e. I fix (E, k, M), I get the data and I save it. However, from the data I have it is not possible to retrieve the set of parameters (E, k, M). Therefore, I would like to save this set with my array.

My first approach was to simply do

np.save('stochastic_data',(rho_av,rho_std, E, k, M))

but this doesn't work because my parameters are floats, not arrays.

My second approach was simply to convert the set of parameters to arrays. Basically, to create an array of identical elements for each parameters, i.e. E-> np.array(E,E,.....,E). However, my arrays are quite big (np.shape(rho_av)=(100000,1000)), so saving the parameters with this shape is not going to be efficient.

Is there a more efficient way to do it?

Thanks.

hpaulj · Accepted Answer · 2021-07-20 07:03:36Z

You are saving a tuple of arrays.

Look what happens with a simple case of 2 arrays with same shape:

In [763]: np.save('test.npy',(np.arange(3), np.ones(3)))
In [764]: np.load('test.npy')
Out[764]: 
array([[0., 1., 2.],
       [1., 1., 1.]])

I got back one array - it made an array from the tuple and saved that.

If the arrays differ in shape, I still get an array, but it is object dtype. And I get a warning (in new enough numpy versions):

In [765]: np.save('test.npy',(np.arange(3), np.ones(4)))
/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:528: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.asanyarray(arr)
In [766]: np.load('test.npy')
Traceback (most recent call last):
  File "<ipython-input-766-aeaca1f70e0f>", line 1, in <module>
    np.load('test.npy')
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 440, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/format.py", line 743, in read_array
    raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False

In [767]: np.load('test.npy',allow_pickle=True)
Out[767]: array([array([0, 1, 2]), array([1., 1., 1., 1.])], dtype=object)

np.save is efficient for writing a numeric multi-dimensional array. But the array has objects, it uses pickle to convert that to a string that can be saved. It's not all bad, since doing a pickle of an array actually uses the same core code as np.save.

There is a np.savez that saves the arrays to separate npy files, and combines them in a zip archive.

But for a diverse mix of items - arrays, lists, scalars, strings etc., pickle might well be the easiest and "most efficient". But you can't save or load items piecemeal. It's not a database.

Collectives™ on Stack Overflow

Python - efficient way to save an array with multiple labels

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related