matching columns of two different data frames in R

Question

I have two data frames with longitude and latitude values, and I would like to extract values from data frame #2 (say column df2$C, third column of the data frame #2) which value match columns of data frame 1... for example, data frame 1 has two columns (lon,lat), and data frame 2 has three columns (lon, lat, and some value "C")... I want to add a third column to data frame 1, in which those values of df2$C correspond to those values that are an exact match of BOTH columns in both data frames, something like df1$lon == df2$lon AND df1$lat == df2$lat... and in lat, lon pairs that doesn't match, I would like to add a NA, so that the third column (that I want to add to data. frame 1) has a length that is = nrow(df1). I tried the merge function, but I'm having troubles matching both columns of df1 to those of df2.

merge(...) should work. You should show your code.

jlhoward
– jlhoward

2014-12-02 21:17:23 +00:00
Commented Dec 2, 2014 at 21:17 — jlhoward
– jlhoward, Commented Dec 2, 2014 at 21:17

akrun · Accepted Answer · 2014-12-02 15:58:01Z

You could try data.table

library(data.table)
setDT(df1)
setkey(setDT(df2), lat, lon)
df2[df1]
#   lat lon          C
#1:  58   1         NA
#2:  52  10         NA
#3:  54   7 -0.9094088
#4:  60   2         NA
#5:  50   3  1.4541841
#6:  56   9 -1.7771135
#7:  59   5         NA
#8:  55   8         NA
#9:  53   4         NA
#10: 57   6         NA

data

df1 <- structure(list(lat = c(58L, 52L, 54L, 60L, 50L, 56L, 59L, 55L, 
53L, 57L), lon = c(1L, 10L, 7L, 2L, 3L, 9L, 5L, 8L, 4L, 6L)), .Names = c("lat", 
"lon"), row.names = c(NA, -10L), class = "data.frame")

df2 <- structure(list(lat = c(51L, 55L, 50L, 58L, 56L, 57L, 60L, 54L, 
 52L, 54L), lon = c(13L, 10L, 3L, 6L, 9L, 8L, 9L, 16L, 4L, 7L), 
 C = c(1.48642005012902, 1.53314455225747, 1.45418413640182, 
-0.874122129771392, -1.77711353745745, 0.128866710402714, 
-2.41118134931725, -1.78305563078752, -0.0173287724390305, 
-0.909408846416724)), .Names = c("lat", "lon", "C"), row.names = c(NA, 
-10L), class = "data.frame")

jlhoward · Accepted Answer · 2014-12-02 21:25:16Z

Since these are geocodes, one thing to watch out for is that the fields have to match exactly. So for instance if one dataset has lon/lat to 6 significant figures, and the other has lon/lat to 8 significant figures, you will get no matches (or very few). I wonder if this is why merge(...) isn't working for you. As shown below, it should work.

merge(...) should work, especially if both data frames have the same column names. Using the datasets from @akrun's answer:

merge(df1,df2, by=c("lon","lat"),all.x=TRUE)
#    lon lat          C
# 1    1  58         NA
# 2    2  60         NA
# 3    3  50  1.4541841
# 4    4  53         NA
# 5    5  59         NA
# 6    6  57         NA
# 7    7  54 -0.9094088
# 8    8  55         NA
# 9    9  56 -1.7771135
# 10  10  52         NA

If you don't specify the by=... argument, merge(...) will use all common columns, so in this case you could just write:

merge(df1,df2,all.x=TRUE)

You could also use join(...) is the plyr package.

library(plyr)
join(df1,df2)

All of these options produce the same result, although the rows are in different order.

The data.table approach will be fastest, although without a really large dataset (>1e5 rows) you might not notice the difference.

OliE · Accepted Answer · 2014-12-02 16:32:20Z

You can use ifelse for this. For example, with the data:

df1 <- structure(list(lat = c(58L, 52L, 54L, 60L, 50L, 56L, 59L, 55L, 
                              53L, 57L), lon = c(1L, 10L, 7L, 2L, 3L, 9L, 5L, 8L, 4L, 6L)), .Names = c("lat", 
                                                                                                       "lon"), row.names = c(NA, -10L), class = "data.frame")

df2 <- structure(list(lat = c(51L, 55L, 50L, 58L, 56L, 57L, 60L, 54L, 
                              52L, 54L), lon = c(13L, 10L, 3L, 6L, 9L, 8L, 9L, 16L, 4L, 7L), 
                      C = c(1.48642005012902, 1.53314455225747, 1.45418413640182, 
                            -0.874122129771392, -1.77711353745745, 0.128866710402714, 
                            -2.41118134931725, -1.78305563078752, -0.0173287724390305, 
                            -0.909408846416724)), .Names = c("lat", "lon", "C"), row.names = c(NA, 
                                                                                               -10L), class = "data.frame")

You can create column C for df1 with

ifelse(df1[,'lat'] %in% df2[,'lat'] & df1[,'lon'] %in% df2[,'lon'],df2$C,NA)

Collectives™ on Stack Overflow

matching columns of two different data frames in R

3 Answers 3

data

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

data

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related