grouping dataframes in pandas efficiently?

Question

I have the following dataframe in pandas where there's a unique index (employee) for each row and also a group label type:

df = pandas.DataFrame({"employee": ["a", "b", "c", "d"], "type": ["X", "Y", "Y", "Y"], "value": [10,20,30,40]})
df = df.set_index("employee")

I want to group the employees by type and then calculate a statistic for each type. How can I do this and get a final dataframe which is type x statistic, for example type x (mean of types)? I tried using groupby:

g = df.groupby(lambda x: df.ix[x]["type"])
result = g.mean()

this is inefficient since it references the index ix of df for each row - is there a better way?

why not just use g = df.groupby("type")?

zs2020
– zs2020

2013-08-08 05:34:55 +00:00
Commented Aug 8, 2013 at 5:34 — zs2020
– zs2020, Commented Aug 8, 2013 at 5:34

Andy Hayden · Accepted Answer · 2013-08-08 08:56:23Z

4

Like @sza says, you can use:

In [11]: g = df.groupby("type")

In [12]: g.mean()
Out[12]:
      value
type
X        10
Y        30

see the groupby docs for more...

answered Aug 8, 2013 at 8:56

community wiki

Andy Hayden

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

grouping dataframes in pandas efficiently?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related