I'm cleaning up a dataframe to train a machine learning model and I found that some entries have two different values in one column. For example:
| A | B |
|---|---|
| 1234 | foo |
| 1234 | bar |
Since the value in column A is 1234 for both entries, the value in column B should be foo (or bar) in both cases.
I tried a brute force approach to this:
for index1, row1 in df.iterrows():
for index2, row2 in df.iterrows():
if (row1['A'] == row2['A']) and ((row1['B'] != row2['B'])):
print(f'Found duplicated A with different B!')
row1['B'] == row2['B']
row1['C'] == row2['C'] == False
But probably there is an easier way to do this that I can't see. Does pandas have any methods to deal with this?