Value matching between two dataframes

Question

I am getting more familiar with R and ran into something I haven't figured out before. Reading and searching online didn't get me closer to a solution.

Goal: take each row in source table (s_col) and match it against each row in target table (t_col). Create a new df where 1 means there is a match and 0 means there is no match or the source value is NA.

Data:

    > s_col<-data.frame(col1=c("Bob", "aunt"), 
    col2= ("likes", "Cathy"), col3 = c(NA, "tea"))

    > s_col
       col1  col2 col3
    1  Bob likes  tea
    2 aunt Cathy <NA>
    3 Tom  wins   twice

    > t_col<-data.frame(col1=c("Bob", NA, "likes", "tea", "Jack"), 
    col2=c("Cathy", "aunt", "Jason", "Bob", "likes"))

    > t_col
       col1  col2
    1   Bob Cathy
    2  <NA>  aunt
    3 likes Jason
    4   tea   Bob
    5  Jack likes

Desired results:

    #output for first row in s_col (Bob, likes, tea)

       col1  col2
    1   1    0
    2   0    0
    3   1    0
    4   1    1
    5   0    1 

    #output for 2nd row in s_col (aunt, Cathy, NA)

       col1  col2
    1   0    1
    2   0    1
    3   0    0
    4   0    0
    5   0    0

    #output for 3nd row in s_col (Tom, wins, twice)

       col1  col2
    1   0    0
    2   0    0
    3   0    0
    4   0    0
    5   0    0

So far this is the progress I have made but the code below is far from the desired results:

    out<-NULL
    output<-NULL
    for(i in 1:ncol(s_col)){
      x<-i
      for(j in 1:nrow(s_col)){
        y<-j 
        temp<- s_col[y,x]
        for(a in 1:ncol(t_col)){
          w<-a
           for(b in 1:nrow(t_col)){
            v<-b 
            temp2<- t_col[v,w]}}
         put<-ifelse(temp %in% temp2, 1, 0)
        out<-c(out,put)
      }

I think your s_col data.frame call have only 2 rows compared to the one you showed — akrun
– akrun, Commented Oct 29, 2017 at 4:07

akrun · Accepted Answer · 2017-10-29 04:04:27Z

2

We can loop through the rows of s_col, then use %in% to compare with the columns of 't_col' to create a list of logical matrices

lapply(seq_len(nrow(s_col)), function(i) +sapply(t_col, `%in%`, unlist(s_col[i,])))
#[[1]]
#     col1 col2
#[1,]    1    0
#[2,]    0    0
#[3,]    1    0
#[4,]    1    1
#[5,]    0    1

#[[2]]
#     col1 col2
#[1,]    0    1
#[2,]    1    1
#[3,]    0    0
#[4,]    0    0
#[5,]    0    0

#[[3]]
#     col1 col2
#[1,]    0    0
#[2,]    0    0
#[3,]    0    0
#[4,]    0    0
#[5,]    0    0

data

s_col<-data.frame(col1=c("Bob", "aunt", "Tom"), 
   col2= c("likes", "Cathy", "wins"), col3 = c("tea", NA, "twice"), stringsAsFactors=FALSE)

answered Oct 29, 2017 at 4:04

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sfyn Over a year ago

I came close to it with ifelse(apply(t_col[], c(1,2), function(r) any(r %in% temp)) == TRUE, 1, 0) but yours is more succinct. Thank you.

Collectives™ on Stack Overflow

Value matching between two dataframes

1 Answer 1

data

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

data

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related