1

df:

      A
0    219
1    590
2    272
3    945
4    175
5    930
6    662
7    472
8    251
9    130

I am trying to create a new column quantile based on which quantile the value falls in, for example:

if value > 1st quantile : value = 1
if value > 2nd quantile : value = 2
if value > 3rd quantile : value = 3
if value > 4th quantile : value = 4

Code:

f_q = df['A'] .quantile (0.25)
s_q = df['A'] .quantile (0.5)
t_q = df['A'] .quantile (0.75)
fo_q = df['A'] .quantile (1)


index = 0
for i  in range(len(test_df)):

   value = df.at[index,"A"]
   if value > 0 and value <= f_q:
       df.at[index,"A"] = 1

   elif value > f_q and value <= s_q:
       df.at[index,"A"] = 2

   elif value > s_q and value <= t_q:
       df.at[index,"A"] = 3

   elif value > t_q and value <= fo_q:
       df.at[index,"A"] = 4


   index += 1

The code works fine. But I would like to know if there is a more efficient pandas way of doing this. Any suggestions are helpful.

1 Answer 1

2

Yes, using pd.qcut:

>>> pd.qcut(df.A, 4).cat.codes + 1
0    1
1    3
2    2
3    4
4    1
5    4
6    4
7    3
8    2
9    1
dtype: int8

(Gives me exactly the same result your code does.)

You could also call np.unique on the qcut result:

>>> np.unique(pd.qcut(df.A, 4), return_inverse=True)[1] + 1
array([1, 3, 2, 4, 1, 4, 4, 3, 2, 1])

Or, using pd.factorize (note the slight difference in the output):

>>> pd.factorize(pd.qcut(df.A, 4))[0] + 1
array([1, 2, 3, 4, 1, 4, 4, 2, 3, 1])
Sign up to request clarification or add additional context in comments.

2 Comments

@vikky Added more options, if needed.
cool Thanks. i will compare the performances of the options and will post that for future readers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.