Match two columns in dataframe

Question

I have two columns in dataframe df

  ID      Name
AXD2     SAM S
AXD2       SAM
SCA4       JIM
SCA4 JIM JONES
ASCQ      JOHN

I need the output to get a unique id and should match the first name only,

  ID  Name
AXD2 SAM S
SCA4   JIM
ASCQ  JOHN

Any suggestions?

Welcome to StackOverlow! What have you tried so far? Could you post your code you have tried? — bonCodigo
– bonCodigo, Commented May 25, 2022 at 15:36
Thats the point I only need to match first record from the name column. — donald smith
– donald smith, Commented May 25, 2022 at 15:41

Onur Guven · Accepted Answer · 2022-05-25 15:45:30Z

1

You can use groupby with agg and get first of Name

df.groupby(['ID']).agg(first_name=('Name', 'first')).reset_index()

answered May 25, 2022 at 15:45

Onur Guven

6304 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Corralien · Accepted Answer · 2022-05-25 15:45:17Z

0

Use drop_duplicates:

out = df.drop_duplicates('ID', ignore_index=True)
print(out)

# Output
     ID   Name
0  AXD2  SAM S
1  SCA4    JIM
2  ASCQ   JOHN

answered May 25, 2022 at 15:45

Corralien

121k8 gold badges44 silver badges69 bronze badges

ArchAngelPwn · Accepted Answer · 2022-05-25 15:47:10Z

0

You can use cumcount() to find the first iteration name of the ID

df['RN'] = df.groupby(['ID']).cumcount() + 1
df = df.loc[df['RN'] == 1]
df[['ID', 'Name']]

answered May 25, 2022 at 15:47

ArchAngelPwn

3,0461 gold badge6 silver badges17 bronze badges