I have a pandas dataframe in python where the rows are identified by p1 & p2, but p2 is sometimes NaN:
p1 p2
0 a 1
1 a 2
2 a 3
3 b NaN
4 c 4
5 d NaN
6 d 5
The above dataframe was returned from a larger one with many duplicates by using
df.drop_duplicates(subset=["p1","p2"], keep='last')
which works for the most part, the only issue being that NaN and 5 are technically not duplicates and therefore not dropped.
How can I drop the rows (such as: "d", NaN) where there is another row with the same p1 and a p2 value of not.null eg. "d", 5. The important thing here being that "b", NaN is kept because there are no rows with "b", not.null.