1

I'm a python user for scientific computation. Now, I have some numpy arrays, and the size of each of them is huge. Thus, I can not short all of them in the memory at the same time. I want to save the arrays in the disk and read them one by one at each time to do some calculation. How to perform this process pythonicly?

I know if all the data are stored in the memory, I can create a list named array_list like this:

array_list = []
for i0 in range(n_array):
    t_ayyay = do_some_calculate()
    array_list.append(t_ayyay)

and when I want to use them:

for i0 in range(n_array):
    t_ayyay = array_list[i0]
    # do something.

How to save array_list in the disk, and I can read each object using the index without load all of them in the memory?

Thanks.

4
  • 3
    Maybe, you should try HDF5 and its python binding h5py docs.h5py.org/en/latest Commented May 16, 2017 at 3:33
  • Both h5py and np.savez save multiple arrays 'by-name'. In other words, you access them by name, as though they were values in a dictionary. Commented May 16, 2017 at 4:00
  • oh, your suggestion works for me, thanks a lot. Is is possible to store all of them in one file? Commented May 16, 2017 at 4:41
  • Yes, as @hpaulj mentioned hdf5 files can store multiple datasets by keyword. Refeer to the h5py documentation for examples and more info. Commented May 16, 2017 at 7:58

1 Answer 1

2

Pickle is your friend for serialization.

import pickle

some_list = [....]
pickle_out = open("some_list.pickle", "w")
pickle.dump(some_list, pickle_out)
pickle_out.close()

to open up your saved array

pickle_in = open("some_list.pickle", "r")
some_list = pickle.open(pickle_in)
Sign up to request clarification or add additional context in comments.

4 Comments

joblib actually has better properties when it comes to serializing large numpy arrays.
Currently, the issue is, some_list is so big that I can NOT load it to the memory. And since I don't need all of them at the same time, I wander if there is a way I can unpick part of it once, like some_list[i0], where i0 is a index.
You may be more interested in mem-maping, which lets you operate on the array while keeping it on disk.
@aryamccarthy, THX, great suggestion. Both packages are helpful for my project.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.