Include missing group keys as NaN in pandas GroupBy output

Question

I have a dataframe in pandas.

test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'],
                       'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'],
                       'ccy': ['USD', 'EUR', 'EUR', 'USD', 'USD', 'USD'],
                       'amt': np.random.random(6)})

test_df:

date         transaction  ccy       amt
2018-12-28   aa           USD  0.323439
2018-12-28   bb           EUR  0.048948
2018-12-29   cc           EUR  0.793263
2018-12-29   aa           USD  0.013865
2018-12-30   bb           USD  0.658571
2018-12-30   bb           USD  0.224951

The following code is giving me this output.

grouper = test_df.groupby([pd.Grouper('date'), 'transaction', 'ccy'])
grp_transactions = grouper['amt'].sum().unstack()

output:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
2018-12-29 aa                NaN  0.013865
           cc           0.793263       NaN
2018-12-30 bb                NaN  0.883523

I believe this is expected as the groupby function will group values in the columns based on the order above, sum accordingly, and not create new rows for transactions that are not in the DF.

Is there a way in pandas to include NaN values if a transaction is not done on a particular day when using groupby? ie. Output should be NaN for both ccy if my DF does not have transaction: cc on 28/12/2018.

Expected output:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.013865
           bb                NaN       NaN
           cc           0.793263       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.883523
           cc                NaN       NaN

Any help would be appreciated. Thanks!

cs95 · Accepted Answer · 2019-01-04 04:14:46Z

6

This is easy if you convert "transaction" to a categorical column before grouping,

df.transaction = pd.Categorical(df.transaction)
df.groupby(['date', 'transaction', 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.404488
           bb           0.459295       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.439354
           bb                NaN       NaN
           cc           0.429269       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  1.542451
           cc                NaN       NaN

Missing categories in the output are represented by NaNs. This is usually possible when performing numeric aggregation.

If you don't want to modify df, this will do:

u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u, 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.429134
           bb           0.852355       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.541576
           bb                NaN       NaN
           cc           0.994095       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.744587
           cc                NaN       NaN

answered Jan 4, 2019 at 4:14

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Akhilesh Pothuri Over a year ago

I have tried the solution suggested by you but in the end result dataframe I was getting the amt columns but not the rest. How can we get back all columns using unstack and stack?

Collectives™ on Stack Overflow

Include missing group keys as NaN in pandas GroupBy output

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related