1

I have two data frames. Want to match the contents of df1$v1 and df2$v2 where they match, replace corresponding df2$v2 content with df1v2 content.

df1
v1 v2
1   a1
2   a2
3   a3

df2
v1  v2  v3 v4
c1   1  c3  c4
d1  2   d3  d4
e1  3   e3  e4   

Looking for this final output.

df2
v1 v2 v3 v4
c1 a1 c3 c4
d1 a2 d3 d4
e1 a3 e3 e4
2
  • Try: merge(df2,df1,by.x="v2",by.y="v1",all.x=TRUE) Commented Jul 9, 2015 at 0:55
  • 2
    Surprisingly, if you want to match variables, use the match function which you could have found using ?match - e.g.: df2$v2 <- df1$v2[match(df2$v2,df1$v1)] . merge is basically an extension of that logic... Commented Jul 9, 2015 at 1:04

2 Answers 2

1
out <- merge(df2, df1, by.x='v2', by.y='v1', all.x=T)
out <- out[, -1]

You'll get a complaint about column name 'v2' being duplicated in both tables, so you could use suppressWarnings() if you wanted, or rename the 'v2' column of df1 to something not already in df2.

merge puts your merge column as the first one (the first 'v2' column being the numeric 1 2 3), hence the out[, -1] to remove it.

Sign up to request clarification or add additional context in comments.

2 Comments

Lets say names(df2) = "p1" "p2" "p3" "p4"
So - the function solution is more elegant actually, and it does the job - if the data frames are the same size. If the data frame size is different, you get error. Error in Ops.data.frame(dF1[, match1], dF2[, match2]) : ‘==’ only defined for equally-sized data frames
0

The 'merge'-solution fails in some cases, e.g. if df1$"v1" and df2$"v2" do not match everywhere:

df1 <- data.frame( v1 = c(1,2,3),
                   v2 = c("a1","a2","a3") )

df2 <- data.frame( v1 = c("c1","d1","e1"),
                   v2 = c(1,5,3),
                   v3 = c("c3","d3","e3"),
                   v4 = c("c4","d4","e4") )

out <- merge(df2, df1, by.x='v2', by.y='v1', all.x=T)
out <- out[,-1]

> out
  v1 v3 v4   v2
1 c1 c3 c4   a1
2 e1 e3 e4   a3
3 d1 d3 d4 <NA>

Another example, where df1$"v1" and df2$"v2" do match everywhere:

df1 <- data.frame( v1 = c(1,2,1),
                   v2 = c("a1","a2","a3") )

df2 <- data.frame( v1 = c("c1","d1","e1"),
                   v2 = c(1,2,1),
                   v3 = c("c3","d3","e3"),
                   v4 = c("c4","d4","e4") )

out <- merge(df2, df1, by.x='v2', by.y='v1', all.x=T)
out <- out[,-1]

> out
  v1 v3 v4 v2
1 c1 c3 c4 a1
2 c1 c3 c4 a3
3 e1 e3 e4 a1
4 e1 e3 e4 a3
5 d1 d3 d4 a2

The following solution is not very elegant, but it works in these examples:

f <- function( dF1, match1, data1,
               dF2, match2, data2  )
{
  if ( is.factor(dF1[,data1]) )
  {
    dF2[,data2] <- as.factor(dF2[,data2])
    levels(dF2[,data2]) <- c(levels(dF2[,data2]),levels(dF1[,data1])) 
  }     
  n <- which(dF1[,match1] == dF2[,match2])         
  dF2[n,data2] <- dF1[n,data1]    
  return( dF2 )
}

out <-f1( df1, "v1", "v2", df2, "v2", "v2" )

Example 1:

> out
  v1 v2 v3 v4
1 c1 a1 c3 c4
2 d1  5 d3 d4
3 e1 a3 e3 e4

Example 2:

> out
  v1 v2 v3 v4
1 c1 a1 c3 c4
2 d1 a2 d3 d4
3 e1 a3 e3 e4

If the rows where df1$v1 and df2$v2 do not match are not wanted in the output, they can be removed by the following modification:

f <- function( dF1, match1, data1,
               dF2, match2, data2  )
{
  if ( is.factor(dF1[,data1]) )
  {
    dF2[,data2] <- as.factor(dF2[,data2])
    levels(dF2[,data2]) <- c(levels(dF2[,data2]),levels(dF1[,data1])) 
  }     
  n <- which(dF1[,match1] == dF2[,match2])         
  dF2[n,data2] <- dF1[n,data1]    
  return( dF2[n,] )
}

out <-f1( df1, "v1", "v2", df2, "v2", "v2" )

Example 1:

> out
  v1 v2 v3 v4
1 c1 a1 c3 c4
3 e1 a3 e3 e4

In the 'merge'-solution this can be achieved by 'all.x=F', but Example 2 still does not work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.