I have the following dataframe. I want to group by a and b first. Within each group, I need to do a value count based on c and only pick the one with most counts. If there are more than one c values for one group with the most counts, just pick any one.
a b c
1 1 x
1 1 y
1 1 y
1 2 y
1 2 y
1 2 z
2 1 z
2 1 z
2 1 a
2 1 a
The expected result would be
a b c
1 1 y
1 2 y
2 1 z
What is the right way to do it? It would be even better if I can print out each group with c's value counts sorted as an intermediate step.
a = 2has 2 entrieszandafor sameb=1?aandbbut fora,bandctoo. Part of the reason of doing this is to remove most of the duplicatesa=2andb=1group has bothzandaappearing twice , hence shouldnt just 1 be taken in the output?