I have a dataframe in pandas.
test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'],
'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'],
'ccy': ['USD', 'EUR', 'EUR', 'USD', 'USD', 'USD'],
'amt': np.random.random(6)})
test_df:
date transaction ccy amt
2018-12-28 aa USD 0.323439
2018-12-28 bb EUR 0.048948
2018-12-29 cc EUR 0.793263
2018-12-29 aa USD 0.013865
2018-12-30 bb USD 0.658571
2018-12-30 bb USD 0.224951
The following code is giving me this output.
grouper = test_df.groupby([pd.Grouper('date'), 'transaction', 'ccy'])
grp_transactions = grouper['amt'].sum().unstack()
output:
ccy EUR USD
date transaction
2018-12-28 aa NaN 0.323439
bb 0.048948 NaN
2018-12-29 aa NaN 0.013865
cc 0.793263 NaN
2018-12-30 bb NaN 0.883523
I believe this is expected as the groupby function will group values in the columns based on the order above, sum accordingly, and not create new rows for transactions that are not in the DF.
Is there a way in pandas to include NaN values if a transaction is not done on a particular day when using groupby? ie. Output should be NaN for both ccy if my DF does not have transaction: cc on 28/12/2018.
Expected output:
ccy EUR USD
date transaction
2018-12-28 aa NaN 0.323439
bb 0.048948 NaN
cc NaN NaN
2018-12-29 aa NaN 0.013865
bb NaN NaN
cc 0.793263 NaN
2018-12-30 aa NaN NaN
bb NaN 0.883523
cc NaN NaN
Any help would be appreciated. Thanks!