I have a pandas dataframe with four feature columns and one label column. There is some issue with the dataset. There are some rows with the same values for the features but are labelled differently. I know how to find duplicates for multiple columns using
df[df.duplicated(keep=False)]
How do I find duplicate features with conflicting labels though?
For example in the dataframe like this
a b c label
0 1 1 2 y
1 1 1 2 x
2 1 1 2 x
3 2 2 2 z
4 2 2 2 z
I want to output something below
a b c label
1 1 2 y
1 1 2 x
subsetparam toduplicated(anddrop_duplicates)