how to create a dataframe directly from groupby

Question

My code beneath works fine. But... I think there is a more efficient way of coding this. But I can't figure it out. I thought reset_index() worked well, but it doesn't in this case. So, all suggestions are welcome. Thanks in advance!

I have a large dataframe (hospital data). All data are from 2017, 2018 and 2019. The column: spoedelectief can have two values: one for emergency and one for non emergency patient. In Dutch emergency is called Spoed. So, emergency is S and non emergency is E.

From the dataframe I want to make ( to visualize the amount of emergency and non emergency each year) a new dataframe. But I'm stuck with that. Some code;

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

gives back a Pandas Series:

operatiejaar  spoedelectief
2017          E                5459
              S                1054
2018          E                6191
              S                1029
2019          E                6160
              S                1159

For visualisation in Seaborn I tried to make this a DataFrame with reset_index() but that gives an error:

ValueError: cannot insert spoedelectief, already exists

Making test a DataFrame works:

test = pd.DataFrame(test)

With this result:

But test.columns gives this:

Index(['spoedelectief'], dtype='object')

Underneath the code I used to create a DataFrame as I wanted:

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

jaar_list = []
spel_list = []
totaal = []
for index, value in test.items():
    jaar_list.append(index[0])
    spel_list.append(index[1])
    totaal.append(value)

spel_jaar = pd.DataFrame(
    {'jaar': jaar_list,
     'spoedelectief': spel_list,
     'totaal': totaal
    })

Which gives the desired DF:

How to code this much easier / directly from the original DF? thanks!

jezrael · Accepted Answer · 2020-12-07 12:52:08Z

1

You need rename Series before Series.reset_index:

test = (df_new.groupby(df_new['operatiejaar'])['spoedelectief']
              .value_counts()
              .rename('count')
              .sort_index()
              .reset_index())

Or use name in Series.reset_index:

test = (df_new.groupby(df_new['operatiejaar'])['spoedelectief']
              .value_counts()
              .sort_index()
              .reset_index(name='count'))

answered Dec 7, 2020 at 12:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pierre D Over a year ago

no need to repeat df_new in the arg of groupby, and no need for sort_index either.

Pierre D · Accepted Answer · 2020-12-07 13:32:57Z

1

Two additional options to consider:

Series.to_frame(name):

test = (
    df_new.groupby('operatiejaar')['spoedelectief']
    .value_counts().to_frame('totaal').reset_index()
)

Reshape your result into several columns, one for each name found by value_counts:

You can also avoid naming the series and instead expand it into two columns for nicer plotting:
```
# 'E' and 'S' counts become two columns
test2 = (
    df_new.groupby('operatiejaar')['spoedelectief']
    .value_counts().unstack()
)
test2.plot.bar()
```
Example (on small randomly generated data):

Notes:

You can dispense with df_new[column_name] as argument for the groupby, and just specify column_name.
You don't have to sort_index() (at least with recent versions of Pandas): both groupby() and value_counts() sort by default.

edited Dec 7, 2020 at 13:32

answered Dec 7, 2020 at 13:26

Pierre D

26.6k8 gold badges71 silver badges108 bronze badges

1 Comment

Janneman Over a year ago

Pierre thanks! great and helpfull answer also. cheers Jan

Collectives™ on Stack Overflow

how to create a dataframe directly from groupby

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related