How to sum values grouped by two columns in pandas

Question

I have a Pandas DataFrame like this:

df = pd.DataFrame({
    'Date': ['2017-1-1', '2017-1-1', '2017-1-2', '2017-1-2', '2017-1-3'],
    'Groups': ['one', 'one', 'one', 'two', 'two'],
    'data': range(1, 6)})

    Date      Groups     data  
0  2017-1-1    one       1
1  2017-1-1    one       2
2  2017-1-2    one       3
3  2017-1-2    two       4
4  2017-1-3    two       5

How can I generate a new DataFrame like this:

    Date       one     two 
0  2017-1-1    3        0
1  2017-1-2    3        4
2  2017-1-3    0        5

tdy · Accepted Answer · 2021-11-01 00:14:57Z

20

pivot_table was made for this:

df.pivot_table(index='Date',columns='Groups',aggfunc=sum)

results in

         data
Groups    one  two
Date
2017-1-1  3.0  NaN
2017-1-2  3.0  4.0
2017-1-3  NaN  5.0

Personally I find this approach much easier to understand, and certainly more pythonic than a convoluted groupby operation. Then if you want the format specified you can just tidy it up:

df.fillna(0,inplace=True)
df.columns = df.columns.droplevel()
df.columns.name = None
df.reset_index(inplace=True)

which gives you

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0

edited Nov 1, 2021 at 0:14

tdy

2232 silver badges9 bronze badges

answered Jul 19, 2017 at 21:29

Josh D.

3181 silver badge5 bronze badges

$\begingroup$ @Josh D. This's cool and straightforward! I agree that it takes some brain power to figure out how groupby works. Thank you! $\endgroup$

Kevin
– Kevin

2017-07-20 11:55:24 +00:00
Commented Jul 20, 2017 at 11:55
$\begingroup$ What I am wondering about is that it is not necessary to mention the column which should be summed, correct? $\endgroup$

Tobitor
– Tobitor

2021-10-06 12:01:57 +00:00
Commented Oct 6, 2021 at 12:01

Add a comment |

tuomastik · Accepted Answer · 2017-07-11 08:55:17Z

9

Pandas black magic:

df = df.groupby(['Date', 'Groups']).sum().sum(
    level=['Date', 'Groups']).unstack('Groups').fillna(0).reset_index()

# Fix the column names
df.columns = ['Date', 'one', 'two']

Resulting df:

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0

edited Jul 11, 2017 at 8:55

answered Jul 10, 2017 at 18:24

tuomastik

1,19110 silver badges22 bronze badges

$\begingroup$ Holy! The black magic is so powerful! Thanks a lot! $\endgroup$

Kevin
– Kevin

2017-07-10 18:59:13 +00:00
Commented Jul 10, 2017 at 18:59
$\begingroup$ You're welcome! See the updated answer; I simplified the expression and added a fix for the column names to be exactly as requested. $\endgroup$

tuomastik
– tuomastik

2017-07-10 19:11:58 +00:00
Commented Jul 10, 2017 at 19:11
$\begingroup$ I think your previous version has its advantage since it can be applied to other more complicated data sets. I copied it here: df.groupby(['Date', 'Groups', 'data'])['data'].sum().sum(level=['Date', 'Groups']).unstack('Groups').fillna(0) $\endgroup$

Kevin
– Kevin

2017-07-10 19:37:02 +00:00
Commented Jul 10, 2017 at 19:37

Add a comment |

Olivier Verdier · Accepted Answer · 2020-01-22 19:14:51Z

0

A (perhaps slightly more idiomatic) alternative to @tuomastik's answer:

df.groupby(['Date', 'Groups']).sum().unstack('Groups', fill_value=0).reset_index()

answered Jan 22, 2020 at 19:14

Olivier Verdier

101

Add a comment |

Stack Exchange Network

How to sum values grouped by two columns in pandas

3 Answers 3

Your Answer

Hot Network Questions

How to sum values grouped by two columns in pandas

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions