Change values from dataframe if two columns are the same

Question

I'm cleaning up a dataframe to train a machine learning model and I found that some entries have two different values in one column. For example:

A	B
1234	foo
1234	bar

Since the value in column A is 1234 for both entries, the value in column B should be foo (or bar) in both cases.

I tried a brute force approach to this:

for index1, row1 in df.iterrows():
    for index2, row2 in df.iterrows():
        if (row1['A'] == row2['A']) and ((row1['B'] != row2['B'])):
            print(f'Found duplicated A with different B!')
            row1['B'] == row2['B']
            row1['C'] == row2['C'] == False

But probably there is an easier way to do this that I can't see. Does pandas have any methods to deal with this?

Quang Hoang · Accepted Answer · 2021-02-08 19:01:43Z

2

You can use groupby.transform('first') (or 'last'):

df['B'] = df.groupby('A')['B'].transform('first')

answered Feb 8, 2021 at 19:01

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marcos Santana Over a year ago

Thanks! It's so simple I'm ashamed i didn't see it hahaha. I'll mark as accepted.

Collectives™ on Stack Overflow

Change values from dataframe if two columns are the same

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related