using numpy percentile on binned data

Question

Suppose house sale figures are presented for a town in ranges:

< $100,000              204
$100,000 - $199,999    1651
$200,000 - $299,999    2405
$300,000 - $399,999    1972
$400,000 - $500,000     872
> $500,000             1455

I want to know which house-price bin a given percentile falls. Is there a way of using numpy's percentile function to do this? I can do it by hand:

import numpy as np
a = np.array([204., 1651., 2405., 1972., 872., 1455.])
b = np.cumsum(a)/np.sum(a) * 100
q = 75
len(b[b <= q])
4       # ie bin $300,000 - $399,999

But is there a way to use np.percentile instead?

perimosocordiae · Accepted Answer · 2014-03-25 15:38:40Z

2

You were almost there:

cs = np.cumsum(a)
bin_idx = np.searchsorted(cs, np.percentile(cs, 75))

At least for this case (and a couple others with larger a arrays), it's not any faster, though:

In [9]: %%timeit
   ...: b = np.cumsum(a)/np.sum(a) * 100
   ...: len(b[b <= 75])
   ...:
10000 loops, best of 3: 38.6 µs per loop

In [10]: %%timeit
   ....: cs = np.cumsum(a)
   ....: np.searchsorted(cs, np.percentile(cs, 75))
   ....:
10000 loops, best of 3: 125 µs per loop

So unless you want to check for multiple percentiles, I'd stick with what you have.

answered Mar 25, 2014 at 15:38

perimosocordiae

17.9k14 gold badges64 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

using numpy percentile on binned data

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related