Python: add values from one dataframe to another (with multiple conditions)

Question

I have two dataframes df1 and df2 that kind of look like this:

Example:

x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1}, 
      {'partner':"Zambia",'trade_value':110, 'commodity': 2}, 
      {'partner': "Germany",'trade_value':120, 'commodity': 2},
      {'partner': "Afghanistan",'trade_value':150, 'commodity': 2},
      {'partner': "USA",'trade_value':1120, 'commodity': 5}];

df1 = pd.DataFrame(x1)

x2 = [{'country': "Afghanistan", 'commodity': 5, 'tariff': 3.5},
      {'country': "Afghanistan", 'commodity': 3, 'tariff': 6.2},
      {'country': "Afghanistan", 'commodity': 1, 'tariff': 9.9},
      {'country': "Afghanistan", 'commodity': 2, 'tariff': 1.4},
      {'country': "USA", 'commodity': 5, 'tariff': 4.3},
      {'country': "Germany", 'commodity': 7, 'tariff': 6.5},
      {'country': "Germany", 'commodity': 2, 'tariff': 8.8}];

df2 = pd.DataFrame(x2)

I want to add a new column to df1 called 'tariff' and assign each 'partner' and 'commodity' in df1 with its appropriate 'tariff' from df2.

Note: sometimes a 'partner' country in df1 is repeated due to multiple trades. Also not all Tariffs are available in df2 so I don't mind leaving a cell in df1 empty.

so far I am at this stage:

#Add new column
df1['tariff'] = 0;

for index, row in df1.iterrows():
    for index, row2 in df2.iterrows():
        if row['partner'] == row2['country']:
            if row['commodity'] == row2['commodity']
                #Dont know what to put here

If I use df1['tariff'].replace(row['tariff'],row2['tariff'],inplace=True); I am getting all the tariff columns filled with the tariff 9.9

The output of df1 should look like this:

|  partner   | trade_value | commodity | tariff |
|------------|-------------|-----------|--------|
| Afghanistan|     100     |     1     |   9.9  |
| Zambia     |     110     |     2     |   NaN  |
| Germany    |     120     |     2     |   8.8  |
| Afghanistan|     150     |     2     |   1.4  |
| USA        |     1120    |     5     |   4.3  |

@W-B I am expected to have an extra column for df1 called tariff. The values under tariff should be according to the country, and the commodity code found in df2... so basically i have two conditions to add a tariff (country and commodity must match) — Hassan Dbouk
– Hassan Dbouk, Commented Nov 23, 2018 at 16:05

yatu · Accepted Answer · 2018-11-23 16:15:22Z

2

`merge`

You can simply use merge to join the two dataframes on the overlapping columns:

pd.merge(left=df1, right=df2, how='left', left_on=['partner', 'commodity'],
         right_on = ['country', 'commodity']).drop(['country'], axis = 1)

     commodity      partner  trade_value  tariff
0          1  Afghanistan          100     9.9
1          2       Zambia          110     NaN
2          2      Germany          120     8.8
3          2  Afghanistan          150     1.4
4          5          USA         1120     4.3

edited Nov 23, 2018 at 16:15

answered Nov 23, 2018 at 16:08

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ricky Kim Over a year ago

you should add how='left' to keep country that has no tariff

yatu Over a year ago

Thanks for pointing out @RickyKim, did not notice there where more countries in df1

Collectives™ on Stack Overflow

Python: add values from one dataframe to another (with multiple conditions)

1 Answer 1

`merge`

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

merge

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related

`merge`