2

I have a data frame (let's call it "csv") that I want to group and get a value of the first element of the group. Example:

A   B   C  D
foo bar happy yellow
foo bar sad   green
foo ape last  laugh

I would like this as output:

A   B   C
foo bar happy
foo ape last

I currently do this:

grp1 = csv.groupby(['A','B'])
lst = [(A,B,csv.ix[group[0]]['C']) for (A,B),group in grp1.groups.items()]
df = DataFrame(lst,columns=['A','B','C'])
df.to_csv('grp.csv',cols=['A','B','C'],index=False)

But this seems inefficient. Do I really have to create a list first, and then create a dataframe from that? Isn't there a way to just create a dataframe directly, or do some sort of indexing or something on the original dataframe so that i can just work with the first record in each group?

1 Answer 1

1

You can use aggregate to define your aggregate function, which will just keep the first element of a column and drop the others.

    In [60]: grp = df.groupby(['A', 'B'])

    In [61]: grp.aggregate({'C': lambda c: c.ix[c.first_valid_index()]})
    Out[61]:
                 C
    A   B  
    foo ape   last
        bar  happy
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.