In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

Question

I have two datasets which have both the same row combinations Country & Year and I would like to add some columns from one dataset to the other one in a way that the row combinations match.

Dataset 1:

+----------+------+---------+---------+-----+
| Country  | Year | exports | imports | ... |
+----------+------+---------+---------+-----+
| Germany  | 2000 | 0.70    | 0.40    | ... |
| Germany  | 2001 | 0.68    | 0.41    | ... |
| Germany  | 2002 | 0.71    | 0.48    | ... |
| Germany  | 2003 | ...     | ...     | ... |
| Spain    | 2000 | 0.51    | 0.56    | ... |
| Spain    | 2001 | 0.48    | 0.50    | ... |
| Spain    | 2002 | 0.50    | 0.53    | ... |
| Spain    | 2003 | ...     | ...     | ... |
| ...      | ...  | ...     | ...     | ... |
+----------+------+---------+---------+-----+

Dataset 2:

+----------+-----+------+--------------+-------+-----+
| Country  | CC  | Year | unemployment | Pop   | ... |
+----------+-----+------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...          | ...   | ... |
| Hungary  | HUN | 2000 | ...          | ...   | ... |
| Hungary  | HUN | 2001 | ...          | ...   | ... |
| Hungary  | HUN | 2002 | ...          | ...   | ... |
| Hungary  | HUN | 2003 | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...          | ...   | ... |
| ...      | ... | ...  | ...          | ...   | ... |
+----------+-----+------+--------------+-------+-----+

I want the merged data to look like this:


+----------+-----+------+---------+---------+--------------+-------+-----+
| Country  | CC  | Year | exports | imports | unemployment | Pop   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+
| Germany  | GER | 2000 | 0.70    | 0.40    | 0.03         | 79.50 | ... |
| Germany  | GER | 2001 | 0.68    | 0.41    | 0.05         | 79.53 | ... |
| Germany  | GER | 2002 | 0.71    | 0.48    | 0.04         | 79.80 | ... |
| Germany  | GER | 2003 | ...     | ...     | ...          | ...   | ... |
| Spain    | ESP | 2000 | 0.51    | 0.56    | 0.08         | 40.2  | ... |
| Spain    | ESP | 2001 | 0.48    | 0.50    | 0.11         | 40.5  | ... |
| Spain    | ESP | 2002 | 0.50    | 0.53    | 0.10         | 40.55 | ... |
| Spain    | ESP | 2003 | ...     | ...     | ...          | ...   | ... |
| ...      | ... | ...  | ...     | ...     | ...          | ...   | ... |
+----------+-----+------+---------+---------+--------------+-------+-----+

So, the countries which are not in dataset 1 (like Hungary in this case) are not in the merged dataset and the country code is also in the new dataset. Could someone tell me how I can achieve this? I have 28 years for about 100 countries each. So using a function in which I have to specify every combination would not be handy...

I tried to merge it with merge()but did not succeed since it just created hundreds of rows with the same country and year combination.

"I tried to merge it with merge() but did not succeed..." What did you try exactly? merge is for sure a sensible option for this task, provided that it is used correctly. Share your attempts and describe why the result is not what you intended. — nicola
– nicola, Commented May 6, 2019 at 13:15

MrNetherlands · Accepted Answer · 2019-05-06 13:35:12Z

1

You can do this with inner_join() from dplyr package

dplyr::inner_join(df1, df2, by=c("Country", "Year"))

answered May 6, 2019 at 13:35

MrNetherlands

9407 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MatthewR · Accepted Answer · 2019-05-06 14:07:44Z

1

merge absolutely should work for this. You should specify that you are merging on two columns.

merge( df1 , df2 , by=c( "Country", "Year") )

Also confirm that the class of the merging vars is the same

sapply( df1[, c( "Country", "Year")] , class )
sapply( df2[, c( "Country", "Year")] , class )

confirm that the variables are spelled the same way in both data frames

intersect( names( df1 ) , names( df2 ))

Finally confirm that year and country are unique in both data.frames

sum( duplicated( df1[ ,c( "Country", "Year") ] ))
sum( duplicated( df2[ ,c( "Country", "Year") ] ))

answered May 6, 2019 at 14:07

MatthewR

2,7806 gold badges29 silver badges42 bronze badges

Comments

Lila · Accepted Answer · 2019-05-08 00:08:39Z

0

The answer with merge() worked! Now I am facing the problem that e.g. Spain does not have any unemployment data for the year 2000. However, I still want to add all years of Spain and would like to have a NA in the unemployment column for Spain in 2000 in the merged dataset. How can I achieve this?

I tried to use merge(df1, df2, all.x = TRUE) but sometimes it just creates NA's for some reason...

edited May 8, 2019 at 0:08

answered May 7, 2019 at 23:33

Lila

597 bronze badges

1 Comment

mnm Over a year ago

try merge(df1, df2, all=TRUE). Also see this post

Collectives™ on Stack Overflow

In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related