Python - load lots of images without using all available ram

Question

I have about 1.5 GB of images that I need to process. The problem is that when I try loading them as np arrays I seem to use up all of my ram (8 GB).

Here is my method for loading images:

def load_image( infilename ) :
    img = Image.open( infilename )
    img.load()
    data = np.asarray( img, dtype="int32" )
    img.close()
    del img
    return data

I thought closing and deleting the img would help, but it doesn't. Can this have something to do with garbage collection?

Code to loop through all images in a list of file names:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))

Is there a better way?

Image types like JPEG are compressed, so when you open them they are uncompressed to full size in memory. You will probably have to process them one at a time. — Brent Washburne
– Brent Washburne, Commented May 6, 2016 at 22:16
Also, if the original images are 8 bit then casting them up to 32 bit floats will quadruple your memory requirements — ali_m
– ali_m, Commented May 6, 2016 at 22:25
If you can explain what you'll be doing with the images after loading them maybe we can help more. Could you possibly not need all images to exist in memory at the same time? — yelsayed
– yelsayed, Commented May 6, 2016 at 22:48
@Yasser, I do not need them all at the same time I will be resizing then cropping then saving. But the problem is that I need an average of their sizes width and height to normalize across all of them. — Kevin
– Kevin, Commented May 6, 2016 at 23:21
@Kevin It takes a lot less RAM to hold a single tuple of (width, height) values per image, rather than the entire image arrays... — ali_m
– ali_m, Commented May 6, 2016 at 23:56

yelsayed · Accepted Answer · 2016-05-06 23:26:12Z

2

It might be worth it to load the image files one by one using PIL to get their size tuples, collect your statistics about averages and what not, then open them again in numpy or PIL to do the actual processing. You might also want to consider sampling for the statistics part so you don't need to load all of them, not that it should take that long anyway, PIL is relatively efficient.

answered May 6, 2016 at 23:26

yelsayed

5,5926 gold badges35 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kevin Over a year ago

I actually just took a 5% sample of images calculated the averages and from there I load 100 images at a time and process them until it goes through all of the images. Thanks for the idea!

mjp · Accepted Answer · 2016-05-06 22:28:22Z

2

You may be able to use manual garbage collection to clear some of the memory between loops:

def memclear():
    import gc   #garbage collector
    cleared = gc.collect()

    print(cleared)

call: memclear() at the end of each loop, so:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))
    memclear()

Hopefully this fixes it. I'm assuming this was downvoted because it manually calls garbage cleaning, which is generally frowned upon, but unfortunately it seems to be necessary sometimes.

edited May 6, 2016 at 22:28

answered May 6, 2016 at 22:14

mjp

1,7102 gold badges26 silver badges38 bronze badges

1 Comment

Kevin Over a year ago

I did try something like this but was not sure if it would cause too much overhead.

Collectives™ on Stack Overflow

Python - load lots of images without using all available ram

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related