Calculating Percentile in Python Pandas Dataframe [duplicate]

Question

I'm trying to calculate the percentile of each number within a dataframe and add it to a new column called 'percentile'.

This is my attempt:

import pandas as pd
from scipy import stats

data = {'symbol':'FB','date':['2012-05-18','2012-05-21','2012-05-22','2012-05-23'],'close':[38.23,34.03,31.00,32.00]}

df = pd.DataFrame(data)

close = df['close']

for i in df:
    df['percentile'] = stats.percentileofscore(close,df['close'])

The column is not being filled and results in 'NaN'. This should be fairly easy, but I'm not sure where I'm going wrong.

Thanks in advance for the help.

no need for looping through for i in df. see this answer stackoverflow.com/a/44607827/1870832 — Max Power
– Max Power, Commented Jun 18, 2017 at 3:06

Scott Boston · Accepted Answer · 2017-06-18 04:20:10Z

9

df.close.apply(lambda x: stats.percentileofscore(df.close.sort_values(),x))

or

df.close.rank(pct=True)

Output:

0    1.00
1    0.75
2    0.25
3    0.50
Name: close, dtype: float64

answered Jun 18, 2017 at 4:20

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mattblack Over a year ago

very simple answer, thanks @scott-boston

Brad Solomon Over a year ago

Use .rank -- should be significantly faster

Mate Hegedus Over a year ago

.rank is 100% what you should use. That lambda function while correct will be MUCH slower

Collectives™ on Stack Overflow

Calculating Percentile in Python Pandas Dataframe [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related