I have a pandas DataFrame with the following data and columns. I am trying to create new observations and rows of data.
import pandas as pd
df = pd.DataFrame({
'MSA': ['SF','SF','SF','SD','SD','SD','PHX','PHX','PHX'],
'STATE': ['CA','CA','CA','CA','CA','CA','AZ','AZ','AZ'],
'YR': [21, 21, 21, 21, 21, 21, 21, 21, 21],
'MO': [9, 10, 11, 9, 10, 11, 9, 10, 11],
'VR': [7.0, 8.0, 9.0, 5.0, 6.0, 7.0, 6.0, 7.0, 8.0],
'EMP': [10, 11, 12, 9, 10, 11, 7, 8, 9],
'GEO': ['P1','P1','P1','P2','P2','P2','P3','P3','P3']
})
I'd like to create new rows of data with a conditions for each column. Basically, for each MSA in a State, create new rows of data until the last completed month in current calendar year. So, this means create labels for YR, MO until 04 2022. For numeric columns VR and EMP, fill the rows with rolling average of last 3 observations. For GEO or any other cat variable simply repeat the labels for each new observations for the MSA-STATE group.
Note, that the obs. for 12-2021 are missing, so need to create them with 2022 values.
pandas.TimeDeltarange of a frequency of one monthYRandMOcolumns.