Python - remove row duplications by part of the row [duplicate]

Question

Say I have the following array:

import numpy as np

data = np.array([[51001, 121, 1, 121212],
                 [51001, 121, 1, 125451],
                 [51001, 125, 1, 127653]]

I want to remove duplicate rows only by the first 3 elements in a row (first 3 columns).

So the result I will get is:

print data
[[51001, 121, 1, 121212],
 [51001, 125, 1, 127653]]

Doesn't matter which row we keep and which row we delete as long as I get the unique by the first 3 columns

I Can slice but I dont know how to maintain the 4th column and I didnt see any answer about how to do it — Eran Moshe
– Eran Moshe, Commented Dec 22, 2016 at 8:08
From this answer post, edit : sorted_idx = np.lexsort(data[:,:3].T) and row_mask = np.append([True],np.any(np.diff(sorted_data[:,:3],axis=0),1)). — Divakar
– Divakar, Commented Dec 22, 2016 at 8:16

Zero · Accepted Answer · 2016-12-22 08:15:14Z

2

Here's one way using drop_duplicates in pandas

In [179]: pd.DataFrame(data).drop_duplicates([0, 1, 2]).values
Out[179]:
array([[ 51001,    121,      1, 121212],
       [ 51001,    125,      1, 127653]])

answered Dec 22, 2016 at 8:15

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python - remove row duplications by part of the row [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related