Pandas : Executing a function based on the values from two dataframes

Question

I have two dataframes as shown below:

df1 = 

                        A     B     C      D            
timestamp               
2022-04-08 10:07:00     40    50    NaN    50        
2022-04-08 10:07:01     70    80    80     10        
2022-04-08 10:07:02     200   220   NaN    10         


df2 = 

             A_1       B_1    
                
C            10        10
D            20        10

The columns C and D of df1 are present as an index in df2. For every row in df1, values in column A and B are noted as a tuple. For example P1 = (A,B). From the df2, for every index, the values in columns A1 and B1 are noted as another tuple. For example P2 = (A1,B1). These tuples are passed to a function as shown below:

def func(P1,P2):
    .....
    ans = (using P1 and P2)
    
    return ans

The answer from the func is updated in df1 in a different column. The desired operation is explained using the dataframe below:

df1 = 

                        A     B     C      D       ans_C                         ans_D      
timestamp               
2022-04-08 10:07:00     40    50    NaN    50     func(P1=(40,50),P2=(10,10))   func(P1=(40,50),p2=(20,10))
2022-04-08 10:07:01     70    80    80     10     func(P1=(70,80),P2=(10,10))   func(P1=(70,80),P2=(20,10)) 
2022-04-08 10:07:02     200   220   NaN    10     func(P1=(200,220),P2=(10,10)) func(P1=(200,220),P2=(20,10))

Is there an easier way to do this?

Thanks in advance!

Can you provide the DataFrame constructors (on a phone, difficult to copy)? — mozway
– mozway, Commented May 11, 2022 at 18:43
data = [[40,50, 'NaN',50],[70,80, 80,10],[200,220, 'NaN',10]] df1 = pd.DataFrame(data,columns=['A','B','C','D']) df1.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0)) df1.set_index('TimeStamp')` — EngGu
– EngGu, Commented May 11, 2022 at 19:36
data2 = [['C',10,10],['D',20,10]] df2 = pd.DataFrame(data2,columns=['','A_1','B_1']) df2.set_index('',inplace=True) — EngGu
– EngGu, Commented May 11, 2022 at 19:42
The question is then, can you vectorize the code? Or must you apply on each combination individually? If the latter, @Shubham's answer is quite good. — mozway
– mozway, Commented May 11, 2022 at 20:02
It has to be applied on each combination individually. @Shubham Sharma's code worked. Thanks! — EngGu
– EngGu, Commented May 12, 2022 at 6:23

Shubham Sharma · Accepted Answer · 2022-05-11 18:53:21Z

2

Here is one approach:

for c in ('C', 'D'):
    P2 = tuple(df2.loc[c])
    df1[f'ans_{c}'] = [func(P1, P2) for P1 in zip(df1['A'], df1['B'])]

answered May 11, 2022 at 18:53

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BeRT2me · Accepted Answer · 2022-05-11 21:05:28Z

1

def func(P1, P2):
    return sum(P1) + sum(P2)

df1[['ans_C', 'ans_D']] = df1.apply(lambda x: [func((x.A,x.B),(df2.loc[z][y] for y in df2.columns)) for z in df2.index], axis=1, result_type='expand')
print(df1)

Output:

                       A    B     C   D  ans_C  ans_D
timestamp
2022-04-08 10:07:00   40   50   NaN  50  110.0  120.0
2022-04-08 10:07:01   70   80  80.0  10  170.0  180.0
2022-04-08 10:07:02  200  220   NaN  10  440.0  450.0

answered May 11, 2022 at 21:05

BeRT2me

13.3k2 gold badges18 silver badges39 bronze badges

Collectives™ on Stack Overflow

Pandas : Executing a function based on the values from two dataframes

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related