Removing duplicates with a condition in data frame

Question

I have a data frame with text as one column and its labels as other column. The texts are duplicates with a single label. I want to remove these duplicates and keep the records for only the label specified.

Sample dataframe:

                 text label
0          great view     a
1          great view     b
2        good balcony     a
3        nice service     a
4        nice service     b
5        nice service     c
6           bad rooms     f
7     nice restaurant     a
8     nice restaurant     d
9   nice beach nearby     x
10        good casino     z

Now if I want to keep the text wherever label a is present and remove only the duplicates. Sample output:

          text label
0         great view     a
1       good balcony     a
2       nice service     a
3          bad rooms     f
4    nice restaurant     a
5  nice beach nearby     x
6        good casino     z

Thanks in advance!

BENY · Accepted Answer · 2019-07-12 03:00:44Z

1

You can simple try sort_values before drop_duplicates, since the df will first ordered by the label by the order of alpha beta (a>b yield to True)

df=df.sort_values('label').drop_duplicates('text')

Or

df=df.sort_values('label').groupby('text').head(1)

Update

Valuetokeep='a'

df=df.iloc[(df.label!=Valuetokeep).argsort()].drop_duplicates('text')

edited Jul 12, 2019 at 3:00

answered Jul 12, 2019 at 2:13

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sk1426 Over a year ago

Thanks for the response. That will help only when I want to keep label 'a' as my selected label. But if I select label 'b' and want to drop duplicates associated with label 'a' then it doesn't work.

Collectives™ on Stack Overflow

Removing duplicates with a condition in data frame

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related