1

I have a pandas DataFrame with the following data and columns. I am trying to create new observations and rows of data.

import pandas as pd

df = pd.DataFrame({
                   'MSA': ['SF','SF','SF','SD','SD','SD','PHX','PHX','PHX'],
                   'STATE': ['CA','CA','CA','CA','CA','CA','AZ','AZ','AZ'],
                   'YR': [21, 21, 21, 21, 21, 21, 21, 21, 21],
                   'MO': [9, 10, 11, 9, 10, 11, 9, 10, 11],
                   'VR': [7.0, 8.0, 9.0, 5.0, 6.0, 7.0, 6.0, 7.0, 8.0],
                   'EMP': [10, 11, 12, 9, 10, 11, 7, 8, 9],
                   'GEO': ['P1','P1','P1','P2','P2','P2','P3','P3','P3']
                 })

I'd like to create new rows of data with a conditions for each column. Basically, for each MSA in a State, create new rows of data until the last completed month in current calendar year. So, this means create labels for YR, MO until 04 2022. For numeric columns VR and EMP, fill the rows with rolling average of last 3 observations. For GEO or any other cat variable simply repeat the labels for each new observations for the MSA-STATE group.

Note, that the obs. for 12-2021 are missing, so need to create them with 2022 values.

3
  • have you thought of using a column with datetime objects? You could create a pandas.TimeDelta range of a frequency of one month Commented May 19, 2022 at 19:47
  • @max yes, that would help with creating new cols for YR and MO columns. Commented May 19, 2022 at 20:11
  • 2
    fyi - need a colon after 'GEO' Commented May 19, 2022 at 21:07

1 Answer 1

2

I am just learning to panda, so it is not an answer, but I was curious to solve the task... Sadly, - I have failed... but maybe my draft will lead to any ideas... this is my most recent test, - I will update it if I will manage to solve the "rolling mean" problem.

df2 = pd.concat([
    pd.DataFrame(
        {'STATE': row.STATE,
         'MSA': row.MSA,
         'YR': 22, # original DF is integer anyway
         'MO': range(1,4), # original DF is integer anyway
         'VR': row.VR,  # cannot find the solution to get .rolling_mean() into this line
         'EMP': row.EMP, # cannot find the solution to get .rolling_mean() into this line
         'GEO': row.GEO
        }
    ) for i, row in df.iterrows()
], ignore_index=True).loc[:, ['MSA', 'STATE', 'YR', 'MO', 'VR', 'EMP', 'GEO']]
df = pd.concat([df,df2], axis=0)
df

if OP will solve the task, please share the result, very curious....

Out:

   MSA STATE  YR  MO   VR  EMP GEO
0    SF    CA  21  10  7.0   10  P1
1    SF    CA  21  11  8.0   11  P1
2    SF    CA  21  12  9.0   12  P1
3    SD    CA  21  10  5.0    9  P2
4    SD    CA  21  11  6.0   10  P2
5    SD    CA  21  12  7.0   11  P2
6   PHX    AZ  21  10  6.0    7  P3
7   PHX    AZ  21  11  7.0    8  P3
8   PHX    AZ  21  12  8.0    9  P3
0    SF    CA  22   1  7.0   10  P1
1    SF    CA  22   2  7.0   10  P1
2    SF    CA  22   3  7.0   10  P1
3    SF    CA  22   1  8.0   11  P1
4    SF    CA  22   2  8.0   11  P1
5    SF    CA  22   3  8.0   11  P1
6    SF    CA  22   1  9.0   12  P1
7    SF    CA  22   2  9.0   12  P1
8    SF    CA  22   3  9.0   12  P1
9    SD    CA  22   1  5.0    9  P2
10   SD    CA  22   2  5.0    9  P2
11   SD    CA  22   3  5.0    9  P2
12   SD    CA  22   1  6.0   10  P2
13   SD    CA  22   2  6.0   10  P2
14   SD    CA  22   3  6.0   10  P2
15   SD    CA  22   1  7.0   11  P2
16   SD    CA  22   2  7.0   11  P2
17   SD    CA  22   3  7.0   11  P2
18  PHX    AZ  22   1  6.0    7  P3
19  PHX    AZ  22   2  6.0    7  P3
20  PHX    AZ  22   3  6.0    7  P3
21  PHX    AZ  22   1  7.0    8  P3
22  PHX    AZ  22   2  7.0    8  P3
23  PHX    AZ  22   3  7.0    8  P3
24  PHX    AZ  22   1  8.0    9  P3
25  PHX    AZ  22   2  8.0    9  P3
26  PHX    AZ  22   3  8.0    9  P3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.