0

I have below dataframe columns:

Index(['Location' 'Dec-2021_x', 'Jan-2022_x', 'Feb-2022_x', 'Mar-2022_x',
       'Apr-2022_x', 'May-2022_x', 'Jun-2022_x', 'Jul-2022_x', 'Aug-2022_x',
       'Sep-2022_x', 'Oct-2022_x', 'Nov-2022_x', 'Dec-2022_x', 'Jan-2023_x',
       'Feb-2023_x', 'Mar-2023_x', 'Apr-2023_x', 'May-2023_x', 'Jun-2023_x',
       'Jul-2023_x', 'Aug-2023_x', 'Sep-2023_x', 'Oct-2023_x', 'Nov-2023_x',
       'Dec-2023_x', 'Jan-2024_x', 'Feb-2024_x', 'Mar-2024_x', 'Apr-2024_x',
       'May-2024_x', 'Jun-2024_x', 'Jul-2024_x', 'Aug-2024_x', 'Sep-2024_x',
       'Oct-2024_x', 'Nov-2024_x', 'Dec-2024_x', 
       'sum_val',
       'Dec-2021_y', 'Jan-2022_y', 'Feb-2022_y',
       'Mar-2022_y', 'Apr-2022_y', 'May-2022_y', 'Jun-2022_y', 'Jul-2022_y',
       'Aug-2022_y', 'Sep-2022_y', 'Oct-2022_y', 'Nov-2022_y', 'Dec-2022_y',
       'Jan-2023_y', 'Feb-2023_y', 'Mar-2023_y', 'Apr-2023_y', 'May-2023_y',
       'Jun-2023_y', 'Jul-2023_y', 'Aug-2023_y', 'Sep-2023_y', 'Oct-2023_y',
       'Nov-2023_y', 'Dec-2023_y', 'Jan-2024_y', 'Feb-2024_y', 'Mar-2024_y',
       'Apr-2024_y', 'May-2024_y', 'Jun-2024_y', 'Jul-2024_y', 'Aug-2024_y',
       'Sep-2024_y', 'Oct-2024_y', 'Nov-2024_y', 'Dec-2024_y'],
      dtype='object')

Sample dataframe with reduced columns looks like this:

df:
Location    Dec-2021_x  Jan-2022_x  sum_val Dec-2021_y  Jan-2022_y
A           212         315         1000    12          13      
B           312         612         1100    13          17      
C           242         712         1010    15          15      
D           215         382         1001    16          17      
E           252         319         1110    17          18      

I have to create a resultant dataframe which will be in the below format:

Index(['Location' 'Dec-2021', 'Jan-2022', 'Feb-2022', 'Mar-2022',
       'Apr-2022', 'May-2022', 'Jun-2022', 'Jul-2022', 'Aug-2022',
       'Sep-2022', 'Oct-2022', 'Nov-2022', 'Dec-2022', 'Jan-2023',
       'Feb-2023', 'Mar-2023', 'Apr-2023', 'May-2023', 'Jun-2023',
       'Jul-2023', 'Aug-2023', 'Sep-2023', 'Oct-2023', 'Nov-2023',
       'Dec-2023', 'Jan-2024', 'Feb-2024', 'Mar-2024', 'Apr-2024',
       'May-2024', 'Jun-2024', 'Jul-2024', 'Aug-2024', 'Sep-2024',
       'Oct-2024', 'Nov-2024', 'Dec-2024'
      dtype='object')

The way we do this is using the formula:

'Dec-2021' = 'Dec-2021_x' * sum_val * 'Dec-2021_y' (these are all numeric columns)

and a similar way for all the months. There are 36 months to be precise. Is there any way to do this in a loop manner for each column in the month-year combination? There are around 65000+ rows here so do not want to overwhelm the system.

2
  • I haven't downvoted, but I am sure it could be best if you could provide the actual dataframe (even a part of the dataframe) instead of the indices. Commented Mar 25, 2022 at 11:08
  • 1
    thanks for the suggestion. Added dataframe sample Commented Mar 25, 2022 at 11:26

1 Answer 1

1

Use:

#sample data
np.random.seed(2022)
c = ['Location', 'Dec-2021_x', 'Jan-2022_x', 'Feb-2022_x', 'Mar-2022_x',
       'Apr-2022_x','sum_val', 'Dec-2021_y', 'Jan-2022_y', 'Feb-2022_y',
       'Mar-2022_y', 'Apr-2022_y']
df = (pd.DataFrame(np.random.randint(10, size=(5, len(c))), columns=c)
        .assign(Location=list('abcde')))
print (df)
          Location  Dec-2021_x  Jan-2022_x  Feb-2022_x  Mar-2022_x  Apr-2022_x  \
0        a           1           1           0           7           8   
1        b           8           0           3           6           8   
2        c           1           7           5           5           4   
3        d           0           7           5           5           8   
4        e           8           0           3           9           5   

   sum_val  Dec-2021_y  Jan-2022_y  Feb-2022_y  Mar-2022_y  Apr-2022_y  
0        2           8           0           5           9           1  
1        0           1           2           0           5           7  
2        8           2           3           1           0           4  
3        2           4           0           9           4           9  
4        2           1           7           2           1           7 

#remove unnecessary columns
df1 = df.drop(['sum_val'], axis=1)
#add columns names for not necessary remove - if need in ouput
df1 = df1.set_index('Location')
#split columns names by last _
df1.columns = df1.columns.str.rsplit('_', n=1, expand=True)

#seelct x and y Dataframes by second level and multiple
df2 = (df1.xs('x', axis=1, level=1).mul(df['sum_val'].to_numpy(), axis= 0) * 
       df1.xs('y', axis=1, level=1))
print (df2)
          Dec-2021  Jan-2022  Feb-2022  Mar-2022  Apr-2022
Location                                                  
a               16         0         0       126        16
b                0         0         0         0         0
c               16       168        40         0       128
d                0         0        90        40       144
e               16         0        12        18        70
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.