1

I have the following df:

Doc Item
1    1
1    1
1    2
1    3
2    1
2    2

I want to add third column with repeating values that (1) increment by one if there is a change in column "Item" and that also (2) restarts if there is a change in column "Doc"

Doc Item  NewCol
 1    1     1
 1    1     1
 1    2     2
 1    3     3
 2    1     1 
 2    2     2

What is the best way to achieve this? Thanks a lot.

2
  • Please supply the code on how you store the DF, it will be easier to help you that way Commented Nov 12, 2020 at 9:55
  • @Tamir You can use pd.read_clipboard unless it's MulitIndex or having datetime Commented Nov 12, 2020 at 9:56

1 Answer 1

2

Use GroupBy.transform wth custom lambda function with factorize:

df['NewCol'] = df.groupby('Doc')['Item'].transform(lambda x: pd.factorize(x)[0]) + 1
print (df)
   Doc  Item  NewCol
0    1     1       1
1    1     1       1
2    1     2       2
3    1     3       3
4    2     1       1
5    2     2       2

If values in Item are integers is possible use GroupBy.rank:

df['NewCol'] = df.groupby('Doc')['Item'].rank(method='dense').astype(int)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.