2

I have two dataframes df1 and df2 that kind of look like this:

Example:

x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1}, 
      {'partner':"Zambia",'trade_value':110, 'commodity': 2}, 
      {'partner': "Germany",'trade_value':120, 'commodity': 2},
      {'partner': "Afghanistan",'trade_value':150, 'commodity': 2},
      {'partner': "USA",'trade_value':1120, 'commodity': 5}];

df1 = pd.DataFrame(x1)

x2 = [{'country': "Afghanistan", 'commodity': 5, 'tariff': 3.5},
      {'country': "Afghanistan", 'commodity': 3, 'tariff': 6.2},
      {'country': "Afghanistan", 'commodity': 1, 'tariff': 9.9},
      {'country': "Afghanistan", 'commodity': 2, 'tariff': 1.4},
      {'country': "USA", 'commodity': 5, 'tariff': 4.3},
      {'country': "Germany", 'commodity': 7, 'tariff': 6.5},
      {'country': "Germany", 'commodity': 2, 'tariff': 8.8}];

df2 = pd.DataFrame(x2)

I want to add a new column to df1 called 'tariff' and assign each 'partner' and 'commodity' in df1 with its appropriate 'tariff' from df2.

Note: sometimes a 'partner' country in df1 is repeated due to multiple trades. Also not all Tariffs are available in df2 so I don't mind leaving a cell in df1 empty.

so far I am at this stage:

#Add new column
df1['tariff'] = 0;

for index, row in df1.iterrows():
    for index, row2 in df2.iterrows():
        if row['partner'] == row2['country']:
            if row['commodity'] == row2['commodity']
                #Dont know what to put here

If I use df1['tariff'].replace(row['tariff'],row2['tariff'],inplace=True); I am getting all the tariff columns filled with the tariff 9.9

The output of df1 should look like this:

|  partner   | trade_value | commodity | tariff |
|------------|-------------|-----------|--------|
| Afghanistan|     100     |     1     |   9.9  |
| Zambia     |     110     |     2     |   NaN  |
| Germany    |     120     |     2     |   8.8  |
| Afghanistan|     150     |     2     |   1.4  |
| USA        |     1120    |     5     |   4.3  |
2
  • What is your expected output Commented Nov 23, 2018 at 15:52
  • @W-B I am expected to have an extra column for df1 called tariff. The values under tariff should be according to the country, and the commodity code found in df2... so basically i have two conditions to add a tariff (country and commodity must match) Commented Nov 23, 2018 at 16:05

1 Answer 1

2

merge

You can simply use merge to join the two dataframes on the overlapping columns:

pd.merge(left=df1, right=df2, how='left', left_on=['partner', 'commodity'],
         right_on = ['country', 'commodity']).drop(['country'], axis = 1)

     commodity      partner  trade_value  tariff
0          1  Afghanistan          100     9.9
1          2       Zambia          110     NaN
2          2      Germany          120     8.8
3          2  Afghanistan          150     1.4
4          5          USA         1120     4.3
Sign up to request clarification or add additional context in comments.

2 Comments

you should add how='left' to keep country that has no tariff
Thanks for pointing out @RickyKim, did not notice there where more countries in df1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.