54

I had a dataframe and did a groupby in FIPS and summed the groups that worked fine.

kl = ks.groupby('FIPS')

kl.aggregate(np.sum)

I just want a normal Dataframe back but I have a pandas.core.groupby.DataFrameGroupBy object.

2
  • 14
    The question title indicates that the question is about how to generally convert a groupby object back to a data frame, yet the question and the accepted answer are only about one special case (sum aggregation). Both the question and the accepted answer would be a lot more helpful if they were about how to generally convert a groupby object to a data frame, without performing any numeric processing on it. Commented Nov 7, 2019 at 10:03
  • to get the groups as a dataFrame use something like this ks.groupby('FIPS').get_group("What ever the groupby values you have"). Commented May 27, 2020 at 14:22

6 Answers 6

29
 df_g.apply(lambda x: x) 

will return the original dataframe.

Sign up to request clarification or add additional context in comments.

7 Comments

But why is this needed?
this is still returns DFGroupby
@cs95 This is equivalent to pd.DataFrame(grouped.groups). The GroupBy.apply function apply func to every group and combine them together in a DataFrame.
@C.K. I understand that, thank you. However, my point was more about why we need this method to return the original DataFrame if df_g itself is the original DataFrame? If it's a question of what apply does and how to apply a function to every group, that's a discussion for another post. 2c
@cs95 Yeap, you're right. I vote for your comment the first time I saw this answer, cause I thought there must be an easier way like grouped.to_df(). However, after I checked the API of the GroupBy object, I found there wasn't such a function, so I came back to tell everyone this is the easiest way to do that. lol.
|
25

The result of kl.aggregate(np.sum) is a normal DataFrame, you just have to assign it to a variable to further use it. With some random data:

>>> df = DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
>>>                         'foo', 'bar', 'foo', 'foo'],
...                  'B' : ['one', 'one', 'two', 'three',
...                         'two', 'two', 'one', 'three'],
...                  'C' : randn(8), 'D' : randn(8)})
>>> grouped = df.groupby('A')
>>> grouped
<pandas.core.groupby.DataFrameGroupBy object at 0x04E2F630>
>>> test = grouped.aggregate(np.sum)
>>> test
            C         D
A                      
bar -1.852376  2.204224
foo -3.398196 -0.045082

7 Comments

Actually, many of DataFrameGroupBy object methods such as (apply, transform, aggregate, head, first, last) return a DataFrame object. I used the method filter in one of my blog posts.
It's not a completely normal DataFrame. For example, if you try to call the .info() method on a GroupBy object, you get AttributeError: Cannot access callable attribute 'info' of 'DataFrameGroupBy' objects, try using the 'apply' method.
call .reset_index() to convert the grouped indices.
+1 @hungryMind - that is the answer. Re Joris answer - it may be a "dataframe" but it's not normal - you can see it has different column grouping of A vs C and D, which causes plots etc to fail when using as a normal dataframe. It needs collapsing with .reset_index() to make it proper!
kl.count() returns a DataFrame
|
1

Using pd.concat, just like this:

   pd.concat(map(lambda x: x[1], groups))

Or also keep index aligned:

   pd.concat(map(lambda x: x[1], groups)).sort_index()

Comments

1

You can output the results of the groupby with a .head('# of rows')to a variable.

Ex: df2 = grouped.head(100)

Now you have a Pandas data frame "df2" with all your grouped data.

Comments

0

The cleanest solution is using reset_index().

df = grouped_df.reset_index()

Docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html

1 Comment

The question is how to convert DataFrameGroupBy to DataFrame. There is no reset_index in DataFrameGroupBy pandas.pydata.org/docs/reference/groupby.html you refer to misleading doc
0
df_agg = df[['Col1','Col2']].groupby(['Col1','Col2']).sum().reset_index()

type(df_agg)

Returns

pandas.core.frame.DataFrame

And df_agg has 2 columns : Col1 and Col2.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.