I often try to do the following operation, but there's an immediate solution which is most efficient in pandas:
I have the following example pandas DataFrame, whereby there are two columns, Name and Age:
import pandas as pd
data = [['Alex',10],['Bob',12],['Barbara',25], ['Bob',72], ['Clarke',13], ['Clarke',13], ['Destiny', 45]]
df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)
print(df)
Name Age
0 Alex 10.0
1 Bob 12.0
2 Barbara 25.0
3 Bob 72.0
4 Clarke 13.0
5 Clarke 13.0
6 Destiny 45.0
I would like to remove all rows which do have a matching value in Name. In the example df, there are two Bob values and two Clarke values. The intended output would therefore be:
Name Age
0 Bob 12.0
1 Bob 72.0
2 Clarke 13.0
3 Clarke 13.0
whereby I'm assuming that there's a reset index.
One option would be to keep all unique values for Name in a list, and then iterate through the dataframe to check for duplicate rows. That would be very inefficient.
Is there a built-in function for this task?
pd.duplicated.