1

I have about 1.5 GB of images that I need to process. The problem is that when I try loading them as np arrays I seem to use up all of my ram (8 GB).

Here is my method for loading images:

def load_image( infilename ) :
    img = Image.open( infilename )
    img.load()
    data = np.asarray( img, dtype="int32" )
    img.close()
    del img
    return data

I thought closing and deleting the img would help, but it doesn't. Can this have something to do with garbage collection?

Code to loop through all images in a list of file names:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))

Is there a better way?

5
  • 1
    Image types like JPEG are compressed, so when you open them they are uncompressed to full size in memory. You will probably have to process them one at a time. Commented May 6, 2016 at 22:16
  • 1
    Also, if the original images are 8 bit then casting them up to 32 bit floats will quadruple your memory requirements Commented May 6, 2016 at 22:25
  • 1
    If you can explain what you'll be doing with the images after loading them maybe we can help more. Could you possibly not need all images to exist in memory at the same time? Commented May 6, 2016 at 22:48
  • @Yasser, I do not need them all at the same time I will be resizing then cropping then saving. But the problem is that I need an average of their sizes width and height to normalize across all of them. Commented May 6, 2016 at 23:21
  • @Kevin It takes a lot less RAM to hold a single tuple of (width, height) values per image, rather than the entire image arrays... Commented May 6, 2016 at 23:56

2 Answers 2

2

It might be worth it to load the image files one by one using PIL to get their size tuples, collect your statistics about averages and what not, then open them again in numpy or PIL to do the actual processing. You might also want to consider sampling for the statistics part so you don't need to load all of them, not that it should take that long anyway, PIL is relatively efficient.

Sign up to request clarification or add additional context in comments.

1 Comment

I actually just took a 5% sample of images calculated the averages and from there I load 100 images at a time and process them until it goes through all of the images. Thanks for the idea!
2

You may be able to use manual garbage collection to clear some of the memory between loops:

def memclear():
    import gc   #garbage collector
    cleared = gc.collect()

    print(cleared)

call: memclear() at the end of each loop, so:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))
    memclear()

Hopefully this fixes it. I'm assuming this was downvoted because it manually calls garbage cleaning, which is generally frowned upon, but unfortunately it seems to be necessary sometimes.

1 Comment

I did try something like this but was not sure if it would cause too much overhead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.