Im new to python and pandas and have some basic question about how to write a short function which takes a pd.Dataframe and returns relative values grouped by month.
Example data:
import pandas as pd
from datetime import datetime
import numpy as np
date_rng = pd.date_range(start='2019-01-01', end='2019-03-31', freq='D')
df = pd.DataFrame(date_rng, columns=['date'])
df['value_in_question'] = np.random.randint(0,100,size=(len(date_rng)))
df.set_index('date',inplace=True)
df.head()
value_in_question
date
2019-01-01 40
2019-01-02 86
2019-01-03 46
2019-01-04 75
2019-01-05 35
def absolute_to_relative(df):
"""
set_index before using
"""
return df.div(df.sum(), axis=1).mul(100)
relative_df = absolute_to_relative(df)
relative_df.head()
value_in_question
date
2019-01-01 0.895055
2019-01-02 1.924368
2019-01-03 1.029313
2019-01-04 1.678228
2019-01-05 0.783173
Rather than taking the column sum and devide each row by that, I would like to have the sum groupby each month. The final df should have the same shape and form but the row values relate to sum of the month.
old:
value_in_question
date
"2019-01-01" value/colum_sum * 100
new:
value_in_question
date
"2019-01-01" value/month_sum * 100
So I tried the following, which returns NA for value_in_question:
def absolute_to_relative_agg(df, agg):
"""
set_index before using
"""
return df.div(df.groupby([pd.Grouper(freq=agg)]).sum(), axis=1)
relative_df = absolute_to_relative(df, 'M')
value_in_question
date
2019-01-01 NaN
2019-01-02 NaN
2019-01-03 NaN
2019-01-04 NaN
2019-01-05 NaN