i have a small sample data set:
import pandas as pd
d = {
'measure1_x': [10,12,20,30,21],
'measure2_x':[11,12,10,3,3],
'measure3_x':[10,0,12,1,1],
'measure1_y': [1,2,2,3,1],
'measure2_y':[1,1,1,3,3],
'measure3_y':[1,0,2,1,1]
}
df = pd.DataFrame(d)
df = df.reindex_axis([
'measure1_x','measure2_x', 'measure3_x','measure1_y','measure2_y','measure3_y'
], axis=1)
it looks like:
measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y
10 11 10 1 1 1
12 12 0 2 1 0
20 10 12 2 1 2
30 3 1 3 3 1
21 3 1 1 3 1
i created the column names almost the same except for '_x' and '_y' to help identify which pair should be multiplying: i want to multiply the pair with the same column name when '_x' and '_y' are disregarded, then i want sum the numbers to get a total number, keep in mind my actual data set is huge and the columns are not in this perfect order so this naming is a way for identifying correct pairs to multiply:
total = measure1_x * measure1_y + measure2_x * measure2_y + measure3_x * measure3_y
so desired output:
measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y total
10 11 10 1 1 1 31
12 12 0 2 1 0 36
20 10 12 2 1 2 74
30 3 1 3 3 1 100
21 3 1 1 3 1 31
my attempt and thought process, but cannot proceed anymore syntax wise:
#first identify the column names that has '_x' and '_y', then identify if
#the column names are the same after removing '_x' and '_y', if the pair has
#the same name then multiply them, do that for all pairs and sum the results
#up to get the total number
for colname in df.columns:
if "_x".lower() in colname.lower() or "_y".lower() in colname.lower():
if "_x".lower() in colname.lower():
colnamex = colname
if "_y".lower() in colname.lower():
colnamey = colname
#if colnamex[:-2] are the same for colnamex and colnamey then multiply and sum