1

I have a DataFrame in python pandas which contains several different entries (rows) having also integer values in columns, for example:

   A  B  C  D  E  F  G  H
0  1  2  1  0  1  2  1  2  
1  0  1  1  1  1  2  1  2
2  1  2  1  2  1  2  1  3
3  0  1  1  1  1  2  1  2 
4  2  2  1  2  1  2  1  3

I would return just the rows which contain common values in columns, the result should be:

   A  B  C  D  E  F  G  H  
1  0  1  1  1  1  2  1  2
3  0  1  1  1  1  2  1  2 

Thanks in advance

2 Answers 2

2

You can use the boolean mask from duplicated passing param keep=False:

In [3]:
df[df.duplicated(keep=False)]

Out[3]:
   A  B  C  D  E  F  G  H
1  0  1  1  1  1  2  1  2
3  0  1  1  1  1  2  1  2

Here is the mask showing the rows that are duplicates, passing keep=False returns all duplicate rows, by default it would return the first duplicate row:

In [4]:
df.duplicated(keep=False)

Out[4]:
0    False
1     True
2    False
3     True
4    False
dtype: bool
Sign up to request clarification or add additional context in comments.

Comments

1

Need duplicated with parameter keep=False for return all duplicates with boolean indexing:

print (df.duplicated(keep=False))
0    False
1     True
2    False
3     True
4    False
dtype: bool

df = df[df.duplicated(keep=False)]
print (df)
   A  B  C  D  E  F  G  H
1  0  1  1  1  1  2  1  2
3  0  1  1  1  1  2  1  2

Also if need remove first or last duplicates rows use:

df1 = df[df.duplicated()]
#same as 'first', default parameter, so an be omit
#df1 = df[df.duplicated(keep='first')]
print (df1)
   A  B  C  D  E  F  G  H
3  0  1  1  1  1  2  1  2

df2 = df[df.duplicated(keep='last')]
print (df2)
   A  B  C  D  E  F  G  H
1  0  1  1  1  1  2  1  2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.