How to create a dataframe from grouped data

Question

I have a data frame (let's call it "csv") that I want to group and get a value of the first element of the group. Example:

A   B   C  D
foo bar happy yellow
foo bar sad   green
foo ape last  laugh

I would like this as output:

A   B   C
foo bar happy
foo ape last

I currently do this:

grp1 = csv.groupby(['A','B'])
lst = [(A,B,csv.ix[group[0]]['C']) for (A,B),group in grp1.groups.items()]
df = DataFrame(lst,columns=['A','B','C'])
df.to_csv('grp.csv',cols=['A','B','C'],index=False)

But this seems inefficient. Do I really have to create a list first, and then create a dataframe from that? Isn't there a way to just create a dataframe directly, or do some sort of indexing or something on the original dataframe so that i can just work with the first record in each group?

lbolla · Accepted Answer · 2012-05-31 08:11:17Z

1

You can use aggregate to define your aggregate function, which will just keep the first element of a column and drop the others.

    In [60]: grp = df.groupby(['A', 'B'])

    In [61]: grp.aggregate({'C': lambda c: c.ix[c.first_valid_index()]})
    Out[61]:
                 C
    A   B  
    foo ape   last
        bar  happy

answered May 31, 2012 at 8:11

lbolla

5,4311 gold badge26 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to create a dataframe from grouped data

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related