3

I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer

In [54]: df

Out[54]:
    data1   data2       key1    key2
0 -0.204708 1.393406    a       one
1 0.478943  0.092908    a       two
2 1.965781  1.246435    a       one

In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413

using the formula to find it manually,n is 3 as there are 3 values in data1 column

quantile(n+1)

applying the values of df1 column

=0.9(n+1) 
=0.9(4)
= 3.6

so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?

1 Answer 1

4

The function quantile will assign percentages based on the range of your data.

In your case:

  • -0.204708 would be considered the 0th percentile,
  • 0.478943 would be considered the 50th percentile and
  • 1.965781 would be considered the 100th percentile.

So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:

>>import numpy as np

>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile    
1.6684133999999999

Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.

Sign up to request clarification or add additional context in comments.

2 Comments

This is very helpful.Thanks and accepted the solution.
Great -- glad it was helpful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.