Python: In Pandas extract data from several columns in a dataframe based on a condition and add to different dataframe matching on a column

Question

I have a large data set in the following format:

import pandas as pd
df1 = pd.DataFrame({'Date':['01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020'],
           'From': ['RU', 'RU', 'RU','USA', 'USA', 'USA','ME', 'ME', 'ME'],
           'To': ['JK', 'JK', 'JK','JK', 'JK', 'JK','JK', 'JK', 'JK'],
           'Distance':[ 40000, 40000, 40000,30000, 30000, 30000,20000, 20000, 20000],
           'Days': [8,8,8,6,6,6,4,4,4]})

df2 = pd.DataFrame({'Date':['01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020'],
           'Contract': ['OrangeTier', 'OrangeTier', 'OrangeTier','AppleTier', 'AppleTier', 'AppleTier','GrapeTier', 'GrapeTier', 'GrapeTier'],
           'Price':[ 10000, 15000, 20000,30000, 35000, 1000,45000, 20000, 21000]})

I would like to add a column to df1, which looks up the Contract 'OrangieTier', matches the dates in df1 with df2 and returns the price. Resulting in the dataframe looking something like this:

df1 = pd.DataFrame({'Date':['01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020','01/02/2020' , '01/03/2020', '01/04/2020'],
           'From': ['RU', 'RU', 'RU','USA', 'USA', 'USA','ME', 'ME', 'ME'],
           'To': ['JK', 'JK', 'JK','JK', 'JK', 'JK','JK', 'JK', 'JK'],
           'Distance':[ 40000, 40000, 40000,30000, 30000, 30000,20000, 20000, 20000],
           'OrangeTier':[10000, 15000, 20000,10000, 15000, 20000,10000, 15000, 20000],
           'Days': [8,8,8,6,6,6,4,4,4]})

I then want to multiply OrangeTier by Days and overwrite the OrangTier column with the result.

I looked for two days on here and tried different ways. I thought it was better to see what someone suggested without posting up my attempt. I'm not lazy, still new to coding and needed help. — hey_arnold
– hey_arnold, Commented Jan 16, 2020 at 22:46

Scott Boston · Accepted Answer · 2020-01-16 19:08:08Z

3

Let's try:

mapper = df2.query('Contract == "OrangeTier"').set_index(['Date'])['Price']

df1['OrangeTier'] = df1['Date'].map(mapper)

df1.assign(OrangeTier=df1['OrangeTier'] * df1['Days'])

Output:

         Date From  To  Distance  Days  OrangeTier
0  01/02/2020   RU  JK     40000     8       80000
1  01/03/2020   RU  JK     40000     8      120000
2  01/04/2020   RU  JK     40000     8      160000
3  01/02/2020  USA  JK     30000     6       60000
4  01/03/2020  USA  JK     30000     6       90000
5  01/04/2020  USA  JK     30000     6      120000
6  01/02/2020   ME  JK     20000     4       40000
7  01/03/2020   ME  JK     20000     4       60000
8  01/04/2020   ME  JK     20000     4       80000

answered Jan 16, 2020 at 19:08

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hey_arnold Over a year ago

Thanks very much. This was very helpful.

Collectives™ on Stack Overflow

Python: In Pandas extract data from several columns in a dataframe based on a condition and add to different dataframe matching on a column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related