2

I have a data frame with text as one column and its labels as other column. The texts are duplicates with a single label. I want to remove these duplicates and keep the records for only the label specified.

Sample dataframe:

                 text label
0          great view     a
1          great view     b
2        good balcony     a
3        nice service     a
4        nice service     b
5        nice service     c
6           bad rooms     f
7     nice restaurant     a
8     nice restaurant     d
9   nice beach nearby     x
10        good casino     z

Now if I want to keep the text wherever label a is present and remove only the duplicates. Sample output:

          text label
0         great view     a
1       good balcony     a
2       nice service     a
3          bad rooms     f
4    nice restaurant     a
5  nice beach nearby     x
6        good casino     z

Thanks in advance!

1 Answer 1

1

You can simple try sort_values before drop_duplicates, since the df will first ordered by the label by the order of alpha beta (a>b yield to True)

df=df.sort_values('label').drop_duplicates('text')

Or

df=df.sort_values('label').groupby('text').head(1)

Update

Valuetokeep='a'

df=df.iloc[(df.label!=Valuetokeep).argsort()].drop_duplicates('text')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the response. That will help only when I want to keep label 'a' as my selected label. But if I select label 'b' and want to drop duplicates associated with label 'a' then it doesn't work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.