19

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.

I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.

Example: if this is my DataFrame

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])

   a    b
0  1    1
1  2   10
2  3  100
3  4  100

and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:

   a    b  quantile
0  1    1    1
1  2   10    1
2  3  100    2
3  4  100    2
6
  • 1
    I think you want DataFrame.quantile with q=[0, 0.25, 0.5, 0.75, 1]... Then you should be able to rank on that column Commented Jul 13, 2016 at 15:52
  • Ah, that makes sense. Thanks! Commented Jul 13, 2016 at 15:59
  • You're welcome, posting comment above as answer below. Commented Jul 13, 2016 at 16:01
  • I am trying to apply your suggestion but It seems I cannot do it Commented Jul 13, 2016 at 16:44
  • It looks like getting the resulting dataframe from quantile back into the original dataframe requires a few extra operations... Working on an edit Commented Jul 13, 2016 at 19:25

4 Answers 4

27

I discovered it is quite easy:

df['quantile'] = pd.qcut(df['b'], 2, labels=False)

   a    b  quantile
0  1    1         0
1  2   10         0
2  3  100         1
3  4  100         1

Interesting to know "difference between pandas.qcut and pandas.cut"

Sign up to request clarification or add additional context in comments.

3 Comments

This generally throws a SettingWithCopyWarning, at least in 0.16.1 (which I'm still using).
@feetweet, thanks for reporting the issue with old pandas version (<= 0.16.1) and for the alternative solution. Anyway for versions >= 0.18 it works without any warning.
to avoid SettingWithCopyWarning, use df.loc[:, "quantile"] = ...
2

df['quantile'] = pd.qcut(df['b'], 2, labels=False) seems to tend to throw a SettingWithCopyWarning.

The only general way I have found of doing this without complaints is like:

quantiles = pd.qcut(df['b'], 2, labels=False)
df = df.assign(quantile=quantiles.values)

This will assign the quantile rank values as a new DataFrame column df['quantile'].

A solution for a more generalized case, in which one wants to partition the cut by multiple columns, is given here.

Comments

1

You can use DataFrame.quantile with q=[0.25, 0.5, 0.75] on the existing column to produce a quartile column.

Then, you can DataFrame.rank on that quartile column.

See below for an example of adding a quartile column:

import pandas as pd

d = {'one' : pd.Series([40., 45., 50., 55, 60, 65], index=['val1', 'val2', 'val3', 'val4', 'val5', 'val6'])}
df = pd.DataFrame(d)

quantile_frame = df.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in df.iterrows():
    if (row['one'] <= quantile_frame.ix[0.25]['one']):
        quantile_ranks.append(1)
    elif (row['one'] > quantile_frame.ix[0.25]['one'] and row['one'] <= quantile_frame.ix[0.5]['one']):
        quantile_ranks.append(2)
    elif (row['one'] > quantile_frame.ix[0.5]['one'] and row['one'] <= quantile_frame.ix[0.75]['one']):
        quantile_ranks.append(3)
    else:
        quantile_ranks.append(4)

df['quartile'] = quantile_ranks

Note: There's probably a more idiomatic way to accomplish this with Pandas... but it's beyond me

Comments

0
df.sort_values(['b'],inplace = True)
df.reset_index(inplace = True,drop = True)
df.reset_index(inplace = True)
df.rename(columns = {'index':'row_num'},inplace = True)
df['quantile'] = df['row_num'].apply(lambda x: math.ceil(10*(x+1)/df.shape[0]))

I used to use this, but I guess I can use quantile

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.