1

Im relatively new to numpy but have started using it to read and write from and to h5 files. I have image data on which I have computed some zonal statistics, reading each pixel value in a given zone into a h5 file. However, I have a lot of pixel values (possibly tens of millions) and wanted to subsample this data so that I am able to cut down the data size but keep the general distribution of the data.

I was wondering if there was a simple way of sampling every 200th value of an array?

I would put up what code I have already but my code only goes as far as to read in my existing data - Im completely stuck as to how I might subsample it so have nothing to show so far.

Thanks

1
  • your question is a little unclear; is your concern with the size on disk or the size in memory? if you don't care about disk space; you can read a sliced view from a h5 file. this will still be slow though, since you effectively still need to read everything from disk. however, subsampling is better done by summing over all pixels; otherwise you might get nasty aliasing artifacts. if disk space is no objection, you could store a whole mipmap in your h5 file. that will give best performance and quality, but will increase rather than decrease your disk space use. Commented Feb 10, 2014 at 20:20

1 Answer 1

8

You can use an array slice:

>>> import numpy as np
>>> a = np.eye(1000)
>>> a[::200, ::200]

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.