0

I have dataframe in this form:

Name Rank  Months
A     'A3'  2
A     'A3'  2
A     'A2'  3
A     'A2'  3
A     'A2'  3
B     'A1'  4
B     'A1'  4
B     'A1'  4
B     'A1'  4
C     'A3'  2
C     'A3'  2
C     'A2'  1

What is the most effective way to create new column with incremental values based on number of months for certain Name and on condition of Rank? So basically the output is the following:

Name Rank  Months  NewIncremental
A     'A3'  2       'P4'
A     'A3'  2       'P5'
A     'A2'  3       'P1'
A     'A2'  3       'P2'
A     'A2'  3       'P3'
B     'A1'  4       'P1'
B     'A1'  4       'P2'
B     'A1'  4       'P3'
B     'A1'  4       'P4'
C     'A3'  2       'P2'
C     'A3'  2       'P3'
C     'A2'  1       'P1'

So the condition would be the rank order, which is A1->A2->A3. Meaning that if there is a name with A2 rank I assign lower incremental value. I guess sorting based on this can help?

EDIT: edited order so that I need to provide arbitrary order of the ranks

3 Answers 3

1

One approach:

ranks = df.sort_values(by=["Rank"],
                    key=lambda x: x.str.replace(r"\D+", "", regex=True).astype(int))\
        .groupby("Name").transform("cumcount") + 1
ranks = ranks.apply("P{}".format)

df["NewIncremental"] = ranks
print(df)

Output

   Name Rank  Months NewIncremental
0     A   A1       2             P1
1     A   A1       2             P2
2     A   A2       3             P3
3     A   A2       3             P4
4     A   A2       3             P5
5     B   A1       4             P1
6     B   A1       4             P2
7     B   A1       4             P3
8     B   A1       4             P4
9     C   A3       2             P2
10    C   A3       2             P3
11    C   A2       1             P1

Step-by-step

# sort df by the given criteria, then group-by
sorted_by_rank = df.sort_values(by=["Rank"], key=lambda x: x.str.replace(r"\D+", "", regex=True).astype(int))

# get the ranks and apply the expected format
ranks = sorted_by_rank.groupby("Name").transform("cumcount") + 1
ranks = ranks.apply("P{}".format)

# assign the new column
df["NewIncremental"] = ranks
print(df)
Sign up to request clarification or add additional context in comments.

Comments

0

Does this solve it for you?

df['NewIncrement'] = 'P' + df.sort_values(['Name', 'Rank']).groupby('Name').rank(method="first", ascending=True).astype(int).astype(str)

Comments

0

IIUC you can simply use rank:

df["new"] = "P"+df.groupby("Name")["Rank"].rank(method="first").astype(int).astype(str)
print (df)

   Name  Rank  Months new
0     A  'A1'       2  P1
1     A  'A1'       2  P2
2     A  'A2'       3  P3
3     A  'A2'       3  P4
4     A  'A2'       3  P5
5     B  'A1'       4  P1
6     B  'A1'       4  P2
7     B  'A1'       4  P3
8     B  'A1'       4  P4
9     C  'A3'       2  P2
10    C  'A3'       2  P3
11    C  'A2'       1  P1

2 Comments

Is it possible to use some custom order of ranks? Because as I understand the first method it ranks based on how those values appear in dataframe. If the 'A2' appeared in the first row would that not work anymore?
first ranks assigned in order they appear in the array that have the same value after computing numerical data ranks along axis, so it would work just fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.