extracting values from dataframe1 using conditions set in dataframe2 (pandas, python)

Question

I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.

df1 = values to exact from

df2 = conditions for exaction and df where the extracted values are used

conditions: df2.HJ = df1HJ & df2.JK = df1 P colum

example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)

df1

╔════╦════╦══════╦══════╦══════╦══════╗
║ HJ ║ P1 ║  P2  ║  P3  ║  P4  ║  P5  ║
╠════╬════╬══════╬══════╬══════╬══════╣
║  5 ║ 51 ║  33  ║  21  ║  31  ║  13  ║
║ 11 ║ 66 ║  45  ║  21  ║  49  ║  58  ║
║ 21 ║  7 ║  55  ║  56  ║  67  ║  73  ║
║ 99 ║  0 ║  76  ║  67  ║  98  ║  29  ║
║ 15 ║ 11 ║  42  ║  79  ║  27  ║  54  ║
╚════╩════╩══════╩══════╩══════╩══════╝

df2

╔════╦════╗
║ HJ ║ JK ║
╠════╬════╣
║ 99 ║ P1 ║
║ 11 ║ P5 ║
║  5 ║ P3 ║
║ 21 ║ P2 ║
║ 11 ║ P3 ║
╚════╩════╝

expected result for df2 after exaction from df1

╔════╦════╦═══════╗
║ HJ ║ JK ║  Ans  ║
╠════╬════╬═══════╣
║ 99 ║ P1 ║    0  ║
║ 11 ║ P5 ║   58  ║
║  5 ║ P3 ║   21  ║
║ 21 ║ P2 ║   55  ║
║ 11 ║ P3 ║   21  ║
╚════╩════╩═══════╝

code for df1

import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)

code for df2

data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)

Regards Thank you

===========

Update

@Scott Boston's solution works:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])

However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?

Scott Boston · Accepted Answer · 2020-06-09 23:27:16Z

7

Use pd.DataFrame.lookup after set_index:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)

Output:

   HJ  JK  ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Using lookup, you have to filter your inputs to lookup first:

df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()

df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])

df2.update(df2m)

df2m.combine_first(df2)

edited Jun 9, 2020 at 23:27

answered Jun 9, 2020 at 16:36

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ManOnTheMoon Over a year ago

thank you for the solution. It works out fine if both df has the duplicate labels, however, KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem? I've updated this in my original question

Scott Boston Over a year ago

Using look up, you need to filter your inputs to lookup first

Shubham Sharma · Accepted Answer · 2020-06-09 17:02:26Z

2

An alternate solution using pd.melt and pd.merge:

d1 = pd.melt(df1, id_vars=['HJ'], var_name='JK', value_name='Ans')
df = pd.merge(df2, d1, on=['HJ', 'JK'], how='left')

# print(df)

   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

edited Jun 9, 2020 at 17:02

answered Jun 9, 2020 at 16:44

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

4 Comments

Scott Boston Over a year ago

Did you know that in pandas 1.2 they are getting rid of pd.Dataframe.lookup?

Shubham Sharma Over a year ago

@ScottBoston No i didn't knew that, that's too sad :( that they are getting rid of lookup..Do you happen to know the reason for the same?

Scott Boston Over a year ago

Take a look at this github.com/pandas-dev/pandas/issues/39173 and github.com/pandas-dev/pandas/issues/…

Shubham Sharma Over a year ago

@ScottBoston Thanks for sharing :)..I will definitely look into it.

Sahith Kurapati · Accepted Answer · 2020-06-09 16:51:53Z

0

This one line is enough to solve your problem: (Trust me it works! I've tried it)

df2['Ans'] = [ df1[df2['JK'][i]][list(df1['HJ']).index(df2['HJ'][i])] for i in range(len(df2['JK']))]

Hope it helps :)

edited Jun 9, 2020 at 16:51

answered Jun 9, 2020 at 16:44

Sahith Kurapati

1,74513 silver badges15 bronze badges

Comments

Ch3steR · Accepted Answer · 2020-06-09 17:16:30Z

0

You can use pd.Index.get_indexer here.

df1.set_index('HJ',inplace=True)
row = df1.index.get_indexer(df2['HJ'])
col = df1.columns.get_indexer(df2['JK'])

df2['Ans'] = df1.to_numpy()[row,col] #df1.values[row,col]
df2
   HJ  JK  Ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

answered Jun 9, 2020 at 17:16

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

Collectives™ on Stack Overflow

extracting values from dataframe1 using conditions set in dataframe2 (pandas, python)

Update

4 Answers 4

2 Comments

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update

4 Answers 4

2 Comments

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related