Pandas creating incremental values in new column based on certain conditions

Question

I have dataframe in this form:

Name Rank  Months
A     'A3'  2
A     'A3'  2
A     'A2'  3
A     'A2'  3
A     'A2'  3
B     'A1'  4
B     'A1'  4
B     'A1'  4
B     'A1'  4
C     'A3'  2
C     'A3'  2
C     'A2'  1

What is the most effective way to create new column with incremental values based on number of months for certain Name and on condition of Rank? So basically the output is the following:

Name Rank  Months  NewIncremental
A     'A3'  2       'P4'
A     'A3'  2       'P5'
A     'A2'  3       'P1'
A     'A2'  3       'P2'
A     'A2'  3       'P3'
B     'A1'  4       'P1'
B     'A1'  4       'P2'
B     'A1'  4       'P3'
B     'A1'  4       'P4'
C     'A3'  2       'P2'
C     'A3'  2       'P3'
C     'A2'  1       'P1'

So the condition would be the rank order, which is A1->A2->A3. Meaning that if there is a name with A2 rank I assign lower incremental value. I guess sorting based on this can help?

EDIT: edited order so that I need to provide arbitrary order of the ranks

Dani Mesejo · Accepted Answer · 2021-10-14 11:08:18Z

One approach:

ranks = df.sort_values(by=["Rank"],
                    key=lambda x: x.str.replace(r"\D+", "", regex=True).astype(int))\
        .groupby("Name").transform("cumcount") + 1
ranks = ranks.apply("P{}".format)

df["NewIncremental"] = ranks
print(df)

Output

   Name Rank  Months NewIncremental
0     A   A1       2             P1
1     A   A1       2             P2
2     A   A2       3             P3
3     A   A2       3             P4
4     A   A2       3             P5
5     B   A1       4             P1
6     B   A1       4             P2
7     B   A1       4             P3
8     B   A1       4             P4
9     C   A3       2             P2
10    C   A3       2             P3
11    C   A2       1             P1

Step-by-step

# sort df by the given criteria, then group-by
sorted_by_rank = df.sort_values(by=["Rank"], key=lambda x: x.str.replace(r"\D+", "", regex=True).astype(int))

# get the ranks and apply the expected format
ranks = sorted_by_rank.groupby("Name").transform("cumcount") + 1
ranks = ranks.apply("P{}".format)

# assign the new column
df["NewIncremental"] = ranks
print(df)

Quixotic22 · Accepted Answer · 2021-10-14 11:05:50Z

0

Does this solve it for you?

df['NewIncrement'] = 'P' + df.sort_values(['Name', 'Rank']).groupby('Name').rank(method="first", ascending=True).astype(int).astype(str)

answered Oct 14, 2021 at 11:05

Quixotic22

2,9241 gold badge9 silver badges14 bronze badges

Comments

Henry Yik · Accepted Answer · 2021-10-14 11:07:19Z

0

IIUC you can simply use rank:

df["new"] = "P"+df.groupby("Name")["Rank"].rank(method="first").astype(int).astype(str)
print (df)

   Name  Rank  Months new
0     A  'A1'       2  P1
1     A  'A1'       2  P2
2     A  'A2'       3  P3
3     A  'A2'       3  P4
4     A  'A2'       3  P5
5     B  'A1'       4  P1
6     B  'A1'       4  P2
7     B  'A1'       4  P3
8     B  'A1'       4  P4
9     C  'A3'       2  P2
10    C  'A3'       2  P3
11    C  'A2'       1  P1

answered Oct 14, 2021 at 11:07

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

2 Comments

DockerUser12 Over a year ago

Is it possible to use some custom order of ranks? Because as I understand the first method it ranks based on how those values appear in dataframe. If the 'A2' appeared in the first row would that not work anymore?

Henry Yik Over a year ago

first ranks assigned in order they appear in the array that have the same value after computing numerical data ranks along axis, so it would work just fine.

Collectives™ on Stack Overflow

Pandas creating incremental values in new column based on certain conditions

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related