Pandas: How to merge rows based on alternate column values?

Question

I've a dataframe and it has some similar rows

e.g: df:

Dist                Id         ID2         ID3      Values
1.309511252         1       4950005568  4865005556   3
0.239604736         2       13077506433 13062506433  4
0.239604736         2       13062506433 13077506433  4
0.230578014         3       4990001482  4880017235   4
0.230578014         3       4880017235  4990001482   4
0.199825732         4       5065006006  4950005965   5
0.199825732         4       4950005965  5065006006   5

As you can see row numbers 2 & 3, 4 & 5 and 6 & 7 have similar values, just columns(ID2 and ID3) interchanged.

I want to remove those duplicates rows but keep which are single one(in this case row number 1)

I want output as:

Dist                Id         ID2         ID3          Values
1.309511252         1       4950005568  4865005556      3
0.239604736         2       13062506433 13077506433     4   
0.230578014         3       4880017235  4990001482      4
0.199825732         4       4950005965  5065006006      5

Mayank Porwal · Accepted Answer · 2020-09-28 03:07:09Z

1

You can simply groupby and pick the last row from every group using tail.

In [831]: df = df.groupby('Id').tail(1).reset_index()

In [832]: df
Out[832]: 
       Dist  Id          ID2          ID3  Values
0  1.309511   1   4950005568   4865005556       3
1  0.239605   2  13062506433  13077506433       4
2  0.230578   3   4880017235   4990001482       4
3  0.199826   4   4950005965   5065006006       5

answered Sep 28, 2020 at 3:07

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RichieV · Accepted Answer · 2020-09-27 20:50:37Z

1

You can use df.drop_duplicates with subset parameter whenever you want to consider only one or some of the columns for dups flagging. Notice this method only works with row-wise duplicates.

df.drop_duplicates(subset=['Dist', 'Id'], inplace=True)

Output

       Dist  Id          ID2          ID3  Values
0  1.309511   1   4950005568   4865005556       3
1  0.239605   2  13077506433  13062506433       4
3  0.230578   3   4990001482   4880017235       4
5  0.199826   4   5065006006   4950005965       5

answered Sep 27, 2020 at 20:50

RichieV

5,1832 gold badges13 silver badges24 bronze badges

Collectives™ on Stack Overflow

Pandas: How to merge rows based on alternate column values?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related