0

I have two data frames with longitude and latitude values, and I would like to extract values from data frame #2 (say column df2$C, third column of the data frame #2) which value match columns of data frame 1... for example, data frame 1 has two columns (lon,lat), and data frame 2 has three columns (lon, lat, and some value "C")... I want to add a third column to data frame 1, in which those values of df2$C correspond to those values that are an exact match of BOTH columns in both data frames, something like df1$lon == df2$lon AND df1$lat == df2$lat... and in lat, lon pairs that doesn't match, I would like to add a NA, so that the third column (that I want to add to data. frame 1) has a length that is = nrow(df1). I tried the merge function, but I'm having troubles matching both columns of df1 to those of df2.

1
  • merge(...) should work. You should show your code. Commented Dec 2, 2014 at 21:17

3 Answers 3

1

You could try data.table

library(data.table)
setDT(df1)
setkey(setDT(df2), lat, lon)
df2[df1]
#   lat lon          C
#1:  58   1         NA
#2:  52  10         NA
#3:  54   7 -0.9094088
#4:  60   2         NA
#5:  50   3  1.4541841
#6:  56   9 -1.7771135
#7:  59   5         NA
#8:  55   8         NA
#9:  53   4         NA
#10: 57   6         NA

data

df1 <- structure(list(lat = c(58L, 52L, 54L, 60L, 50L, 56L, 59L, 55L, 
53L, 57L), lon = c(1L, 10L, 7L, 2L, 3L, 9L, 5L, 8L, 4L, 6L)), .Names = c("lat", 
"lon"), row.names = c(NA, -10L), class = "data.frame")

df2 <- structure(list(lat = c(51L, 55L, 50L, 58L, 56L, 57L, 60L, 54L, 
 52L, 54L), lon = c(13L, 10L, 3L, 6L, 9L, 8L, 9L, 16L, 4L, 7L), 
 C = c(1.48642005012902, 1.53314455225747, 1.45418413640182, 
-0.874122129771392, -1.77711353745745, 0.128866710402714, 
-2.41118134931725, -1.78305563078752, -0.0173287724390305, 
-0.909408846416724)), .Names = c("lat", "lon", "C"), row.names = c(NA, 
-10L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

1

Since these are geocodes, one thing to watch out for is that the fields have to match exactly. So for instance if one dataset has lon/lat to 6 significant figures, and the other has lon/lat to 8 significant figures, you will get no matches (or very few). I wonder if this is why merge(...) isn't working for you. As shown below, it should work.

merge(...) should work, especially if both data frames have the same column names. Using the datasets from @akrun's answer:

merge(df1,df2, by=c("lon","lat"),all.x=TRUE)
#    lon lat          C
# 1    1  58         NA
# 2    2  60         NA
# 3    3  50  1.4541841
# 4    4  53         NA
# 5    5  59         NA
# 6    6  57         NA
# 7    7  54 -0.9094088
# 8    8  55         NA
# 9    9  56 -1.7771135
# 10  10  52         NA

If you don't specify the by=... argument, merge(...) will use all common columns, so in this case you could just write:

merge(df1,df2,all.x=TRUE)

You could also use join(...) is the plyr package.

library(plyr)
join(df1,df2)

All of these options produce the same result, although the rows are in different order.

The data.table approach will be fastest, although without a really large dataset (>1e5 rows) you might not notice the difference.

Comments

0

You can use ifelse for this. For example, with the data:

df1 <- structure(list(lat = c(58L, 52L, 54L, 60L, 50L, 56L, 59L, 55L, 
                              53L, 57L), lon = c(1L, 10L, 7L, 2L, 3L, 9L, 5L, 8L, 4L, 6L)), .Names = c("lat", 
                                                                                                       "lon"), row.names = c(NA, -10L), class = "data.frame")

df2 <- structure(list(lat = c(51L, 55L, 50L, 58L, 56L, 57L, 60L, 54L, 
                              52L, 54L), lon = c(13L, 10L, 3L, 6L, 9L, 8L, 9L, 16L, 4L, 7L), 
                      C = c(1.48642005012902, 1.53314455225747, 1.45418413640182, 
                            -0.874122129771392, -1.77711353745745, 0.128866710402714, 
                            -2.41118134931725, -1.78305563078752, -0.0173287724390305, 
                            -0.909408846416724)), .Names = c("lat", "lon", "C"), row.names = c(NA, 
                                                                                               -10L), class = "data.frame")

You can create column C for df1 with

ifelse(df1[,'lat'] %in% df2[,'lat'] & df1[,'lon'] %in% df2[,'lon'],df2$C,NA)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.