2

I have a data-frame which looks like:

A       B       C       D       E
a       aa      1       2       3
b       aa      4       5       6
c       cc      7       8       9
d       cc      11      10      3
e       dd      71      81      91

As rows (1,2) and rows (3,4) has duplicate values of column B. I want to keep only one of them.

The Final output should be:

A       B       C       D       E
a       aa      1       2       3
c       cc      7       8       9
e       dd      71      81      91

How can I use pandas to accomplish this?

0

3 Answers 3

3
DataFrame.drop_duplicates(subset="B", keep='first')

keep: keep is to control how to consider duplicate value.

  1. It has only three distinct values and the default is ‘first’.

  2. If ‘first’, it considers the first value as unique and the rest of the same values as duplicate.

  3. If ‘last’, it considers the last value as unique and the rest of the same values as duplicate. If False, it considers all of the same values as duplicates

Sign up to request clarification or add additional context in comments.

Comments

3

Try drop_duplicates

df = df.drop_duplicates('B')
   A   B   C   D   E
0  a  aa   1   2   3
2  c  cc   7   8   9
4  e  dd  71  81  91

Comments

2

In the general case, We need to drop across multiple columns. In that case, you need to use as follow

df.drop_duplicates(subset=['A', 'C'], keep=First)

We specify the column names in the subset argument and we use the keep argument to say what we need to keep

  • first : Drop duplicates except for the first occurrence.

  • last : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.