The 'merge'-solution fails in some cases, e.g. if df1$"v1" and df2$"v2" do not match everywhere:
df1 <- data.frame( v1 = c(1,2,3),
v2 = c("a1","a2","a3") )
df2 <- data.frame( v1 = c("c1","d1","e1"),
v2 = c(1,5,3),
v3 = c("c3","d3","e3"),
v4 = c("c4","d4","e4") )
out <- merge(df2, df1, by.x='v2', by.y='v1', all.x=T)
out <- out[,-1]
> out
v1 v3 v4 v2
1 c1 c3 c4 a1
2 e1 e3 e4 a3
3 d1 d3 d4 <NA>
Another example, where df1$"v1" and df2$"v2" do match everywhere:
df1 <- data.frame( v1 = c(1,2,1),
v2 = c("a1","a2","a3") )
df2 <- data.frame( v1 = c("c1","d1","e1"),
v2 = c(1,2,1),
v3 = c("c3","d3","e3"),
v4 = c("c4","d4","e4") )
out <- merge(df2, df1, by.x='v2', by.y='v1', all.x=T)
out <- out[,-1]
> out
v1 v3 v4 v2
1 c1 c3 c4 a1
2 c1 c3 c4 a3
3 e1 e3 e4 a1
4 e1 e3 e4 a3
5 d1 d3 d4 a2
The following solution is not very elegant, but it works in these examples:
f <- function( dF1, match1, data1,
dF2, match2, data2 )
{
if ( is.factor(dF1[,data1]) )
{
dF2[,data2] <- as.factor(dF2[,data2])
levels(dF2[,data2]) <- c(levels(dF2[,data2]),levels(dF1[,data1]))
}
n <- which(dF1[,match1] == dF2[,match2])
dF2[n,data2] <- dF1[n,data1]
return( dF2 )
}
out <-f1( df1, "v1", "v2", df2, "v2", "v2" )
Example 1:
> out
v1 v2 v3 v4
1 c1 a1 c3 c4
2 d1 5 d3 d4
3 e1 a3 e3 e4
Example 2:
> out
v1 v2 v3 v4
1 c1 a1 c3 c4
2 d1 a2 d3 d4
3 e1 a3 e3 e4
If the rows where df1$v1 and df2$v2 do not match are not wanted in the output, they can be removed by the following modification:
f <- function( dF1, match1, data1,
dF2, match2, data2 )
{
if ( is.factor(dF1[,data1]) )
{
dF2[,data2] <- as.factor(dF2[,data2])
levels(dF2[,data2]) <- c(levels(dF2[,data2]),levels(dF1[,data1]))
}
n <- which(dF1[,match1] == dF2[,match2])
dF2[n,data2] <- dF1[n,data1]
return( dF2[n,] )
}
out <-f1( df1, "v1", "v2", df2, "v2", "v2" )
Example 1:
> out
v1 v2 v3 v4
1 c1 a1 c3 c4
3 e1 a3 e3 e4
In the 'merge'-solution this can be achieved by 'all.x=F', but Example 2 still does not work.
merge(df2,df1,by.x="v2",by.y="v1",all.x=TRUE)matchvariables, use thematchfunction which you could have found using?match- e.g.:df2$v2 <- df1$v2[match(df2$v2,df1$v1)].mergeis basically an extension of that logic...