5

I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.

df1 = values to exact from

df2 = conditions for exaction and df where the extracted values are used

conditions: df2.HJ = df1HJ & df2.JK = df1 P colum

example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)

df1

╔════╦════╦══════╦══════╦══════╦══════╗
║ HJ ║ P1 ║  P2  ║  P3  ║  P4  ║  P5  ║
╠════╬════╬══════╬══════╬══════╬══════╣
║  5 ║ 51 ║  33  ║  21  ║  31  ║  13  ║
║ 11 ║ 66 ║  45  ║  21  ║  49  ║  58  ║
║ 21 ║  7 ║  55  ║  56  ║  67  ║  73  ║
║ 99 ║  0 ║  76  ║  67  ║  98  ║  29  ║
║ 15 ║ 11 ║  42  ║  79  ║  27  ║  54  ║
╚════╩════╩══════╩══════╩══════╩══════╝

df2

╔════╦════╗
║ HJ ║ JK ║
╠════╬════╣
║ 99 ║ P1 ║
║ 11 ║ P5 ║
║  5 ║ P3 ║
║ 21 ║ P2 ║
║ 11 ║ P3 ║
╚════╩════╝

expected result for df2 after exaction from df1

╔════╦════╦═══════╗
║ HJ ║ JK ║  Ans  ║
╠════╬════╬═══════╣
║ 99 ║ P1 ║    0  ║
║ 11 ║ P5 ║   58  ║
║  5 ║ P3 ║   21  ║
║ 21 ║ P2 ║   55  ║
║ 11 ║ P3 ║   21  ║
╚════╩════╩═══════╝

code for df1

import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)

code for df2

data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)

Regards Thank you

===========

Update

@Scott Boston's solution works:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK']) 

However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?

4 Answers 4

7

Use pd.DataFrame.lookup after set_index:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)

Output:

   HJ  JK  ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Using lookup, you have to filter your inputs to lookup first:

df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()

df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])

df2.update(df2m)

df2m.combine_first(df2)
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for the solution. It works out fine if both df has the duplicate labels, however, KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem? I've updated this in my original question
Using look up, you need to filter your inputs to lookup first
2

An alternate solution using pd.melt and pd.merge:

d1 = pd.melt(df1, id_vars=['HJ'], var_name='JK', value_name='Ans')
df = pd.merge(df2, d1, on=['HJ', 'JK'], how='left')

# print(df)

   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

4 Comments

Did you know that in pandas 1.2 they are getting rid of pd.Dataframe.lookup?
@ScottBoston No i didn't knew that, that's too sad :( that they are getting rid of lookup..Do you happen to know the reason for the same?
@ScottBoston Thanks for sharing :)..I will definitely look into it.
0

This one line is enough to solve your problem: (Trust me it works! I've tried it)

df2['Ans'] = [ df1[df2['JK'][i]][list(df1['HJ']).index(df2['HJ'][i])] for i in range(len(df2['JK']))]

Hope it helps :)

Comments

0

You can use pd.Index.get_indexer here.

df1.set_index('HJ',inplace=True)
row = df1.index.get_indexer(df2['HJ'])
col = df1.columns.get_indexer(df2['JK'])

df2['Ans'] = df1.to_numpy()[row,col] #df1.values[row,col]
df2
   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.