I have a data frame (let's call it "csv") that I want to group and get a value of the first element of the group. Example:
A B C D
foo bar happy yellow
foo bar sad green
foo ape last laugh
I would like this as output:
A B C
foo bar happy
foo ape last
I currently do this:
grp1 = csv.groupby(['A','B'])
lst = [(A,B,csv.ix[group[0]]['C']) for (A,B),group in grp1.groups.items()]
df = DataFrame(lst,columns=['A','B','C'])
df.to_csv('grp.csv',cols=['A','B','C'],index=False)
But this seems inefficient. Do I really have to create a list first, and then create a dataframe from that? Isn't there a way to just create a dataframe directly, or do some sort of indexing or something on the original dataframe so that i can just work with the first record in each group?