5

I've tried this method outlined by Hpaulji but it doesn't seem to working:

How to append many numpy files into one numpy file in python

Basically, I'm iterating through a generator, making some changes to an array, and then trying to save the each iteration's array.

Here is what my sample code looks like:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

Here, I'm going through 5 iterations, so I was hoping to save 5 different arrays.

I printed out a portion of each array, for debugging purposes:

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

But when I tried to load the array, multiple times as noted here, How to append many numpy files into one numpy file in python, I'm getting an EOFERROR:

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

It's only outputting the last array and then an EOFERROR:

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

I was expection all 5 arrays to be saved, but when I load the save .npy file multiple times, I only get the last array.

So, how should I be saving saving and appending new array to a file?

EDIT: Testing with '.npz' only saves last array

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']
4
  • as an aside, i don't know how large your date is, but have you tried HDF5, or are you tied to .npy for storage? Commented Feb 3, 2018 at 23:27
  • I haven't tried HDF5. I seems that is the better option( my data is about 100,000 images) but I would have to do a little more digging through the docs as I'm not as as familiar with HDF5. Commented Feb 3, 2018 at 23:30
  • ok, unfortunately i can't help with your question, but look up h5py documentation, the syntax is easy to pick up to start storing / appending numeric data, and if used correctly can be fast. Commented Feb 3, 2018 at 23:34
  • @jp_data_analysis Thanks, I think I may just switch to HDF5 as it's more widely used. Commented Feb 3, 2018 at 23:44

1 Answer 1

3

All your calls to np.save use the filename, not the filehandle. Since you do not reuse the filehandle, each save overwrites the file instead of appending the array to it.

This should work:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

And while there may be advantages to storing multiple arrays in one .npy file (I imagine advantages in situations where memory is limited), they are technically meant to store one single array, and you can use .npz files (np.savez or np.savez_compressed) to store multiple arrays:

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4
Sign up to request clarification or add additional context in comments.

7 Comments

Ah! Thank you. I'm going to test it out when I get chance.
I just tried .npz -- testing.npz and np.savez(f, prediction), but it seems to be saving the last array only. I'm loading the array the same way as the code in the OP, but I only see one key -- ['arr_0']. I will update the OP just in case I'm making a mistake.
I've added an example for npz-files. For that you only call savez once with all the entries (as one list of arrays or many arrays).
@yself Thank you for listing out the multiple ways.
@YSelf, this is a great answer as the docs don't say anything about saving lists of np arrays. I tried calling np.savez(filename, next_array) like an append to the file but obviously it does not work like that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.