Efficiently resize batch of np.array images

Question

I have a 4D np.array size (10000,32,32,3) that represents a set of 10000 RGB images.

How can I use skimage.transform.resize or other function to resize all images efficiently so that the (32,32) is interpolated to (224,224)? I'd prefer to do this with skimage, but I'm open to any solutions that don't use tf.image.resize_images.

My current solution is using tf.image.resize_images, but it's causing GPU memory issues later down in my pipeline (won't free up memory after finishing in jupyter notebook) so I'd like to replace it.

Example:

import tensorflow as tf
X = tf.image.resize_images(X,[224, 224])
with tf.Session() as sess:
    X = X.eval()

Austin · Accepted Answer · 2018-10-31 15:07:13Z

6

I won't likely accept my own answer, but it seems that a simple for loop is actually fairly fast (says ~300% cpu utilization from top).

from skimage.transform import resize

imgs_in = np.random.rand(100, 32, 32, 3)
imgs_out = np.zeros((100,224,224,3))

for n,i in enumerate(imgs_in):
    imgs_out[n,:,:,:] = resize(imgs_in[n,:,:,:], imgs_out.shape[1:], anti_aliasing=True)

print(imgs_out.shape)

Seems to be 7-8x faster than ndi.zoom on my machine. Trying to parallelize this further with multiprocessing would be even better I think.

edited Oct 31, 2018 at 15:07

answered Oct 31, 2018 at 0:41

Austin

7,41913 gold badges80 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andras Deak -- Слава Україні · Accepted Answer · 2018-10-31 00:27:44Z

3

One possibility is scipy.ndimage.zoom which can work with your collection of images and use spline interpolation of a given order to upsample your images:

import numpy as np
import scipy.ndimage as ndi

imgs_in = np.random.rand(100, 32, 32, 3)
factor = 224/imgs_in.shape[1]
imgs_out = ndi.zoom(imgs_in, (1, factor, factor, 1), order=2)
print(imgs_out.shape)

The resulting shape is (100, 224, 224, 3) as expected.

You'll have to check whether the runtime and result is acceptable for your needs. Twiddling with the order of interpolation will probably affect this: there is a noticeable speed difference between second-order and (the default) third-order splines, at the cost of interpolation quality.

answered Oct 31, 2018 at 0:27

Andras Deak -- Слава Україні

35.4k13 gold badges94 silver badges118 bronze badges

4 Comments

Austin Over a year ago

Thanks for your answer. This seems to work, and I might have to use it. The runtime seems to be an order of magnitude slower than the tensorflow version, but that is causing other issues.

Andras Deak -- Слава Україні Over a year ago

@Austin indeed, looking at the CPU and memory use I suspect there's little vectorization going on. But I guess a pedestrian plan C might always come in handy...

Mark Setchell Over a year ago

Wouldn't this be a good candidate for joblib or multi-threading/multi-processing with 10,000 images?

Andras Deak -- Слава Україні Over a year ago

@MarkSetchell I guess so. But this might apply to the question in general. Do GPUs play nicely with other kinds of parallelization? I can't say I'm particularly savvy in the subject.

Collectives™ on Stack Overflow

Efficiently resize batch of np.array images

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related