1

I have two columns in dataframe df

  ID      Name
AXD2     SAM S
AXD2       SAM
SCA4       JIM
SCA4 JIM JONES
ASCQ      JOHN

I need the output to get a unique id and should match the first name only,

  ID  Name
AXD2 SAM S
SCA4   JIM
ASCQ  JOHN

Any suggestions?

4
  • Welcome to StackOverlow! What have you tried so far? Could you post your code you have tried? Commented May 25, 2022 at 15:36
  • 1
    Why does AXD2 have SAM S while SCA4 only has JIM? Commented May 25, 2022 at 15:38
  • Thats the point I only need to match first record from the name column. Commented May 25, 2022 at 15:41
  • In fact you want to keep the first row of each ID? Commented May 25, 2022 at 15:48

3 Answers 3

1

You can use groupby with agg and get first of Name

df.groupby(['ID']).agg(first_name=('Name', 'first')).reset_index()
Sign up to request clarification or add additional context in comments.

Comments

0

Use drop_duplicates:

out = df.drop_duplicates('ID', ignore_index=True)
print(out)

# Output
     ID   Name
0  AXD2  SAM S
1  SCA4    JIM
2  ASCQ   JOHN

Comments

0

You can use cumcount() to find the first iteration name of the ID

df['RN'] = df.groupby(['ID']).cumcount() + 1
df = df.loc[df['RN'] == 1]
df[['ID', 'Name']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.