2

I have two dataframes and I want to merge them using 2 keys and one of them will be columns directly

I have the following Dataframes:

DF:-

    Sex Age    Height   country    Year   Grade
0   M   31.0    188.0   Bulgaria    2016    D+
1   F   28.0    166.0   China       1996    D+
2   M   30.0    NaN     Sweden      1960    D+
3   F   28.0    181.0   China       2004    D+
4   F   16.0    175.0   Hungary     1998    D+

GDP_data:

    Country Name    Country Code    2016       1996     1960     2004      1998
0   Bulgaria          BGR           1946      NaN       5377    5285       NaN
1   China             CHI           1186      3314      NaN     7314       3314
2   Sweden            SWE           1590      4694      2723    8532       4694
3   China             CHI           6580      NaN       NaN     5120       NaN
4   Hungary           HUN           2858      1223      NaN     2935       1223

The desired Dataframe after merge is:-

    Sex Age    Height   country    Year   Grade   GDP
0   M   31.0    188.0   Bulgaria    2016    D+    1946
1   F   28.0    166.0   China       1996    D+    3314
2   M   30.0    NaN     Sweden      1960    D+    2723
3   F   28.0    181.0   China       2004    D+    5120
4   F   16.0    175.0   Hungary     1998    D+    1223

The resultant DataFrame should get the GDP of country with respect to year.

I need to match Country Name and country from DF and GDP_data respectively and also Year column from first DataFrame but in the second DataFrame I have years as columns.

How do I merge these two?

This is just the sample Data I have shown here but in reality it is very big data with arount 20000 rows and gdp data from 1960 to 2016. But the Idea should be the same.

5
  • How did you end up with 2 same columns 1996? Commented Aug 21, 2021 at 18:31
  • That is a mistake I will edit it Commented Aug 21, 2021 at 18:31
  • Google for melt function Commented Aug 21, 2021 at 18:40
  • In GDP data dataframe, you have 2 rows of China as Country Name and Year as 2004, which one should be picked? there are a lot of grey areas in this question, please review and edit your question Commented Aug 21, 2021 at 18:45
  • Out of those 2 entries one is for GDP of china for the year 1996 and another is for year 2004 Commented Aug 21, 2021 at 18:48

1 Answer 1

3

I have named these DataFrames a and b, respectively. I also added an underscore in the names of columns where there is a space.

b needs to be melted.

>>> melted = b.melt(id_vars=('Country_Name', 'Country_Code'), var_name='year', value_name='GDP')
   Country_Name Country_Code  year     GDP
0      Bulgaria          BGR  2016  1946.0
1         China          CHI  2016  1186.0
2        Sweden          SWE  2016  1590.0
3         China          CHI  2016  6580.0
...

And this continues. You can then merge the two DataFrames.

>>> pd.merge(a, melted, left_on=('country', 'Year'), right_on=('Country_Name', 'year'))
  Sex   Age  Height   country  Year Grade Country_Name Country_Code  year     GDP
0   M  31.0   188.0  Bulgaria  2016    D+     Bulgaria          BGR  2016  1946.0
1   F  28.0   166.0     China  1996    D+        China          CHI  1996  3314.0
2   F  28.0   166.0     China  1996    D+        China          CHI  1996     NaN
3   M  30.0     NaN    Sweden  1960    D+       Sweden          SWE  1960  2723.0
4   F  28.0   181.0     China  2004    D+        China          CHI  2004  7314.0
5   F  28.0   181.0     China  2004    D+        China          CHI  2004  5120.0
6   F  16.0   175.0   Hungary  1998    D+      Hungary          HUN  1998  1223.0

The redundant columns can be dropped.

Update: I see that there are sometimes more than one GDP for a given country and year, so one possibility is to retain only the highest.

>>> new_melted = melted.sort_values('GDP', ascending=False).drop_duplicates(('Country_Name', 'year'))

   Country_Name Country_Code  year     GDP
17       Sweden          SWE  2004  8532.0
16        China          CHI  2004  7314.0
3         China          CHI  2016  6580.0
10     Bulgaria          BGR  1960  5377.0
...
14      Hungary          HUN  1960     NaN
20     Bulgaria          BGR  1998     NaN

You can then perform the same merge with this one.

Sign up to request clarification or add additional context in comments.

2 Comments

I was too slow, damn phone ;) +1
Wow this is what I was exactly looking for. Thanks a lot

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.