0

I am currently saving some data from a process.

np.save('stochastic_data',(rho_av,rho_std))

where rho_av and rho_std are arrays.

However, this data depends on some parameters, say E, k, and M. For each of them, I get different data. But I am only saving the data for a given set of parameters, i.e. I fix (E, k, M), I get the data and I save it. However, from the data I have it is not possible to retrieve the set of parameters (E, k, M). Therefore, I would like to save this set with my array.

My first approach was to simply do

np.save('stochastic_data',(rho_av,rho_std, E, k, M))

but this doesn't work because my parameters are floats, not arrays.

My second approach was simply to convert the set of parameters to arrays. Basically, to create an array of identical elements for each parameters, i.e. E-> np.array(E,E,.....,E). However, my arrays are quite big (np.shape(rho_av)=(100000,1000)), so saving the parameters with this shape is not going to be efficient.

Is there a more efficient way to do it?

Thanks.

1 Answer 1

1

You are saving a tuple of arrays.

Look what happens with a simple case of 2 arrays with same shape:

In [763]: np.save('test.npy',(np.arange(3), np.ones(3)))
In [764]: np.load('test.npy')
Out[764]: 
array([[0., 1., 2.],
       [1., 1., 1.]])

I got back one array - it made an array from the tuple and saved that.

If the arrays differ in shape, I still get an array, but it is object dtype. And I get a warning (in new enough numpy versions):

In [765]: np.save('test.npy',(np.arange(3), np.ones(4)))
/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:528: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.asanyarray(arr)
In [766]: np.load('test.npy')
Traceback (most recent call last):
  File "<ipython-input-766-aeaca1f70e0f>", line 1, in <module>
    np.load('test.npy')
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 440, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/format.py", line 743, in read_array
    raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False

In [767]: np.load('test.npy',allow_pickle=True)
Out[767]: array([array([0, 1, 2]), array([1., 1., 1., 1.])], dtype=object)

np.save is efficient for writing a numeric multi-dimensional array. But the array has objects, it uses pickle to convert that to a string that can be saved. It's not all bad, since doing a pickle of an array actually uses the same core code as np.save.

There is a np.savez that saves the arrays to separate npy files, and combines them in a zip archive.

But for a diverse mix of items - arrays, lists, scalars, strings etc., pickle might well be the easiest and "most efficient". But you can't save or load items piecemeal. It's not a database.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the pickle idea. I works perfect for my case!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.