2

I'm using Python 2.7.3 with numpy and pyfits to process scientific FITS files. I would like to work on the images at half or one quarter resolution for the sake of speed, and have this code:

# Read red image
hdulist = pyfits.open(red_fn)
img_data = hdulist[0].data
hdulist.close()
img_data_r = numpy.array(img_data, dtype=float)
# Scale it down to one quarter size
my=[]
for line in img_data_r[::4]:
    myline=[]
    for item in line[::4]:
        myline.append(item)
    my.append(myline)
img_data_r = my

This works, but I wonder if there is a faster, more native way to reduce the array. The reductions should happen as early as possible, the idea being that the data that will be processed is of minimal acceptable size. If there was a way of reading a reduced dataset with pyfits, that would be ideal. But such a method doesn't seem to exist (correct me if I'm wrong). How about numpy? Or scipy/math/anything else?

8
  • I think scikit-image has a resize method (not sure if it's faster though): scikit-image.org/docs/dev/api/… Commented Feb 5, 2014 at 12:54
  • I think reading the original (whole) data is the bottleneck and unless you only read the thumbnail of the image (in some JPGs these exist in an EXIF tag), all you do will not fasten this up dramatically. Commented Feb 5, 2014 at 12:59
  • These kind of questions are probably more adequate for CodeReview Commented Feb 5, 2014 at 13:10
  • You're probably right about the bottleneck. Still, even though I haven't timed it, Sven's solution should be faster than my nested for loops. At least it's shorter and more elegant! :) EDIT: @joaquin - Sorry if this the wrong place, I'm a noob! Commented Feb 5, 2014 at 13:15
  • 1
    @KarlD 4000x4000 is not big, 16 megapixels, so even if each pixel is a complex128 (worst case, probably you use float64 though), thats 256Mb in memory, which is still small (sidequestion: how much ram you got, and did you use some benchmarking stuff to check your bottlenecks?) In general, when doing image processing: avoid python for-loops, nearly everything can be pushed down to numpy (read: C), which is way faster Commented Feb 5, 2014 at 13:31

1 Answer 1

3

The data array you get from pyfits already is a NumPy array. You don't need to create one from it. Moerover, you can simply do the downsampling in a single step:

img_data_r = hdulist[0].data[::4, ::4]

This won't copy the data, but rather simply copy a new view with different strides. If you need the down-sampled image as a contiguous array, use numpy.ascontiguousarray().

This method of downsampling only keeps one in sixteen pixels, and completely drops the information in all the other pixels. If you need higher-quality downsampling, rather than doing it in your code, you are probably better off to downsample your FITS files using Imagemagick. This will also reduce the time it takes to read the files from disk.

To convert all your FITS files in the current directory in place (warning: big versions get overwritten), you could use

mogrify -resize 25% *.fits
Sign up to request clarification or add additional context in comments.

3 Comments

Sven, that works, thank you. I knew I was missing something, and you pointed out the exact thing, stepping through a two dimensional list. I've seen this before and didn't understand it... As for the ImageMagick method, I didn't even know IM could read FITS files. Will experiment, but probably keep the native Python code.
Or, use Python imaging library to do the resizing.
In any case, never ever use Python for loops over Numpy arrays; if there isn't already a way to do it in Numpy, scipy, or related packages then write a routine in Cython or C. But in any case for resampling there are already tons of options.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.