27
$\begingroup$

I have a Pandas DataFrame like this:

df = pd.DataFrame({
    'Date': ['2017-1-1', '2017-1-1', '2017-1-2', '2017-1-2', '2017-1-3'],
    'Groups': ['one', 'one', 'one', 'two', 'two'],
    'data': range(1, 6)})

    Date      Groups     data  
0  2017-1-1    one       1
1  2017-1-1    one       2
2  2017-1-2    one       3
3  2017-1-2    two       4
4  2017-1-3    two       5

How can I generate a new DataFrame like this:

    Date       one     two 
0  2017-1-1    3        0
1  2017-1-2    3        4
2  2017-1-3    0        5
$\endgroup$

3 Answers 3

20
$\begingroup$

pivot_table was made for this:

df.pivot_table(index='Date',columns='Groups',aggfunc=sum)

results in

         data
Groups    one  two
Date
2017-1-1  3.0  NaN
2017-1-2  3.0  4.0
2017-1-3  NaN  5.0

Personally I find this approach much easier to understand, and certainly more pythonic than a convoluted groupby operation. Then if you want the format specified you can just tidy it up:

df.fillna(0,inplace=True)
df.columns = df.columns.droplevel()
df.columns.name = None
df.reset_index(inplace=True)

which gives you

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0
$\endgroup$
2
  • $\begingroup$ @Josh D. This's cool and straightforward! I agree that it takes some brain power to figure out how groupby works. Thank you! $\endgroup$ Commented Jul 20, 2017 at 11:55
  • $\begingroup$ What I am wondering about is that it is not necessary to mention the column which should be summed, correct? $\endgroup$ Commented Oct 6, 2021 at 12:01
9
$\begingroup$

Pandas black magic:

df = df.groupby(['Date', 'Groups']).sum().sum(
    level=['Date', 'Groups']).unstack('Groups').fillna(0).reset_index()

# Fix the column names
df.columns = ['Date', 'one', 'two']

Resulting df:

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0
$\endgroup$
3
  • $\begingroup$ Holy! The black magic is so powerful! Thanks a lot! $\endgroup$ Commented Jul 10, 2017 at 18:59
  • $\begingroup$ You're welcome! See the updated answer; I simplified the expression and added a fix for the column names to be exactly as requested. $\endgroup$ Commented Jul 10, 2017 at 19:11
  • $\begingroup$ I think your previous version has its advantage since it can be applied to other more complicated data sets. I copied it here: df.groupby(['Date', 'Groups', 'data'])['data'].sum().sum(level=['Date', 'Groups']).unstack('Groups').fillna(0) $\endgroup$ Commented Jul 10, 2017 at 19:37
0
$\begingroup$

A (perhaps slightly more idiomatic) alternative to @tuomastik's answer:

df.groupby(['Date', 'Groups']).sum().unstack('Groups', fill_value=0).reset_index()
$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.