1

My code beneath works fine. But... I think there is a more efficient way of coding this. But I can't figure it out. I thought reset_index() worked well, but it doesn't in this case. So, all suggestions are welcome. Thanks in advance!

I have a large dataframe (hospital data). All data are from 2017, 2018 and 2019. The column: spoedelectief can have two values: one for emergency and one for non emergency patient. In Dutch emergency is called Spoed. So, emergency is S and non emergency is E.

From the dataframe I want to make ( to visualize the amount of emergency and non emergency each year) a new dataframe. But I'm stuck with that. Some code;

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

gives back a Pandas Series:

operatiejaar  spoedelectief
2017          E                5459
              S                1054
2018          E                6191
              S                1029
2019          E                6160
              S                1159

For visualisation in Seaborn I tried to make this a DataFrame with reset_index() but that gives an error:

ValueError: cannot insert spoedelectief, already exists

Making test a DataFrame works:

test = pd.DataFrame(test)

With this result:

enter image description here

But test.columns gives this:

Index(['spoedelectief'], dtype='object')

Underneath the code I used to create a DataFrame as I wanted:

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

jaar_list = []
spel_list = []
totaal = []
for index, value in test.items():
    jaar_list.append(index[0])
    spel_list.append(index[1])
    totaal.append(value)

spel_jaar = pd.DataFrame(
    {'jaar': jaar_list,
     'spoedelectief': spel_list,
     'totaal': totaal
    })

Which gives the desired DF:

enter image description here

How to code this much easier / directly from the original DF? thanks!

2 Answers 2

1

You need rename Series before Series.reset_index:

test = (df_new.groupby(df_new['operatiejaar'])['spoedelectief']
              .value_counts()
              .rename('count')
              .sort_index()
              .reset_index())

Or use name in Series.reset_index:

test = (df_new.groupby(df_new['operatiejaar'])['spoedelectief']
              .value_counts()
              .sort_index()
              .reset_index(name='count'))
Sign up to request clarification or add additional context in comments.

1 Comment

no need to repeat df_new in the arg of groupby, and no need for sort_index either.
1

Two additional options to consider:

  1. Series.to_frame(name):

    test = (
        df_new.groupby('operatiejaar')['spoedelectief']
        .value_counts().to_frame('totaal').reset_index()
    )
    
  2. Reshape your result into several columns, one for each name found by value_counts:

    You can also avoid naming the series and instead expand it into two columns for nicer plotting:

    # 'E' and 'S' counts become two columns
    test2 = (
        df_new.groupby('operatiejaar')['spoedelectief']
        .value_counts().unstack()
    )
    test2.plot.bar()
    

    Example (on small randomly generated data):

    enter image description here

Notes:

  • You can dispense with df_new[column_name] as argument for the groupby, and just specify column_name.
  • You don't have to sort_index() (at least with recent versions of Pandas): both groupby() and value_counts() sort by default.

1 Comment

Pierre thanks! great and helpfull answer also. cheers Jan

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.