0

I have two dataframes as shown below:

df1 = 

                        A     B     C      D            
timestamp               
2022-04-08 10:07:00     40    50    NaN    50        
2022-04-08 10:07:01     70    80    80     10        
2022-04-08 10:07:02     200   220   NaN    10         


df2 = 

             A_1       B_1    
                
C            10        10
D            20        10   

The columns C and D of df1 are present as an index in df2. For every row in df1, values in column A and B are noted as a tuple. For example P1 = (A,B). From the df2, for every index, the values in columns A1 and B1 are noted as another tuple. For example P2 = (A1,B1). These tuples are passed to a function as shown below:

def func(P1,P2):
    .....
    ans = (using P1 and P2)
    
    return ans

The answer from the func is updated in df1 in a different column. The desired operation is explained using the dataframe below:

df1 = 

                        A     B     C      D       ans_C                         ans_D      
timestamp               
2022-04-08 10:07:00     40    50    NaN    50     func(P1=(40,50),P2=(10,10))   func(P1=(40,50),p2=(20,10))
2022-04-08 10:07:01     70    80    80     10     func(P1=(70,80),P2=(10,10))   func(P1=(70,80),P2=(20,10)) 
2022-04-08 10:07:02     200   220   NaN    10     func(P1=(200,220),P2=(10,10)) func(P1=(200,220),P2=(20,10))

Is there an easier way to do this?

Thanks in advance!

5
  • Can you provide the DataFrame constructors (on a phone, difficult to copy)? Commented May 11, 2022 at 18:43
  • data = [[40,50, 'NaN',50],[70,80, 80,10],[200,220, 'NaN',10]] df1 = pd.DataFrame(data,columns=['A','B','C','D']) df1.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0)) df1.set_index('TimeStamp')` Commented May 11, 2022 at 19:36
  • data2 = [['C',10,10],['D',20,10]] df2 = pd.DataFrame(data2,columns=['','A_1','B_1']) df2.set_index('',inplace=True) Commented May 11, 2022 at 19:42
  • 1
    The question is then, can you vectorize the code? Or must you apply on each combination individually? If the latter, @Shubham's answer is quite good. Commented May 11, 2022 at 20:02
  • It has to be applied on each combination individually. @Shubham Sharma's code worked. Thanks! Commented May 12, 2022 at 6:23

2 Answers 2

2

Here is one approach:

for c in ('C', 'D'):
    P2 = tuple(df2.loc[c])
    df1[f'ans_{c}'] = [func(P1, P2) for P1 in zip(df1['A'], df1['B'])]
Sign up to request clarification or add additional context in comments.

Comments

1
def func(P1, P2):
    return sum(P1) + sum(P2)

df1[['ans_C', 'ans_D']] = df1.apply(lambda x: [func((x.A,x.B),(df2.loc[z][y] for y in df2.columns)) for z in df2.index], axis=1, result_type='expand')
print(df1)

Output:

                       A    B     C   D  ans_C  ans_D
timestamp
2022-04-08 10:07:00   40   50   NaN  50  110.0  120.0
2022-04-08 10:07:01   70   80  80.0  10  170.0  180.0
2022-04-08 10:07:02  200  220   NaN  10  440.0  450.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.