I have a sample table like this:
Dataframe: df
Col1 Col2 Col3 Col4
A 1 10 i
A 1 11 k
A 1 12 a
A 2 10 w
A 2 11 e
B 1 15 s
B 1 16 d
B 2 21 w
B 2 25 e
B 2 36 q
C 1 23 a
C 1 24 b
I'm trying to get all records/rows of the groups (Col1, Col2) that has the smaller number of records AND skipping over those groups that have only 1 record (in this example Col1 = 'C'). So, the output would be as follows:
A 2 10 w
A 2 11 e
B 1 15 s
B 1 16 d
since group (A,2) has 2 records compared to group (A,1) which has 3 records.
I tried to approach this issue from different angles but just can't seem to get the result that I need. I am able to find the groups that I need using a combination of groupby, filter and agg but how do I now use this as a select filter on df? After spending a lot of time on this, I wasn't even sure that the approach was correct as it looked overly complicated. I am sure that there is an elegant solution but I just can't see it. Any advise on how to approach this would be greatly appreciated.
I had this to get the groups for which I wanted the rows displayed:
groups = df.groupby(["Col1, Col2"])["Col2"].agg({'no':'count'})
filteredGroups = groups.groupby(level=0).filter(lambda group: group.size > 1)
print filteredGroups.groupby(level=0).agg('idxmin')
The second line was to account for groups that may have only one record as those I don't want to consider. Honestly, I tried so many variations and approaches that eventually did not give me the result that I wanted. I see that all answers are not one-liners so that at least I don't feel like I was over thinking the problem.