Downsample numpy array while preserving distribution

Question

I'm trying to write a function that can randomly sample a numpy.ndarray that has floating point numbers while preserving the distribution of the numbers in the array. I have this function for now:

import random
from collections import Counter

def sample(A, N):
    population = np.zeros(sum(A))
    counter = 0
    for i, x in enumerate(A):
            for j in range(x):
                    population[counter] = i
                    counter += 1

    sampling = population[np.random.choice(0, len(population), N)]
    return np.histogram(sampling, bins = np.arange(len(A)+1))[0]

So I would like the function to work something like this(doesn't include accounting for distribution for this example):

a = np.array([1.94, 5.68, 2.77, 7.39, 2.51])
new_a = sample(a,3)

new_a
array([1.94, 2.77, 7.39])

However, when I apply the function to an array like this I'm getting:

TypeError                                 Traceback (most recent call last)
<ipython-input-74-07e3aa976da4> in <module>
----> 1 sample(a, 3)

<ipython-input-63-2d69398e2a22> in sample(A, N)
      3 
      4 def sample(A, N):
----> 5     population = np.zeros(sum(A))
      6     counter = 0
      7     for i, x in enumerate(A):

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Any help on modifying or create a function that would work for this would be really appreciated!

The argument for np.zeros is supposed to be a shape - an integer, or tuple of integers, e.g np.zeros((2,3)). Your a/A is an array of floats, so the sum will also be a float. — hpaulj
– hpaulj, Commented Jun 13, 2019 at 1:12
So how do I fix this? I tried changing the argument to np.zeros(sum(A.shape)) but it's still the same error — mlenthusiast
– mlenthusiast, Commented Jun 13, 2019 at 16:11

hpaulj · Accepted Answer · 2019-06-13 17:12:47Z

In [67]: a = np.array([1.94, 5.68, 2.77, 7.39, 2.51])                                                  
In [68]: np.zeros(sum(a))                                                                              
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-263779bc977b> in <module>
----> 1 np.zeros(sum(a))

TypeError: 'numpy.float64' object cannot be interpreted as an integer

sum on the shape does not produce this error:

In [69]: np.zeros(sum(a.shape))                                                                        
Out[69]: array([0., 0., 0., 0., 0.])

But you shouldn't need to use sum:

In [70]: a.shape                                                                                       
Out[70]: (5,)
In [71]: np.zeros(a.shape)                                                                             
Out[71]: array([0., 0., 0., 0., 0.])

In fact if a is 2d, and you want a 1d array with the same number of items, you want the product of the shape, not the sum.

But do you want to return an array exactly the same size as A? I thought you were trying to downsize.

Collectives™ on Stack Overflow

Downsample numpy array while preserving distribution

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related