Say I have a dataframe of the form where rn is the row index
A1 | A2 | A3
-----------------
r1 x | 0 | t
r2 y | 1 | u
r3 z | 1 | v
r4 x | 2 | w
r5 z | 2 | v
r6 x | 2 | w
If I wanted to subset this dataframe such that the column A2 has only unique values, I'd use df.drop_duplicates('A2'). However, that'd keep only the first row of the unique value and delete the rest. For this example, only r2 and r4 will be in the subset.
What I want is that any of the rows with duplicate values are selected randomly rather than the first row. So for this example, for A2 == 1, r2 or r3 is selected randomly or for A2 == 2 any of r4, r5 or r6 is selected randomly. How would I go about implementing this?