2

I am getting more familiar with R and ran into something I haven't figured out before. Reading and searching online didn't get me closer to a solution.

Goal: take each row in source table (s_col) and match it against each row in target table (t_col). Create a new df where 1 means there is a match and 0 means there is no match or the source value is NA.

Data:

    > s_col<-data.frame(col1=c("Bob", "aunt"), 
    col2= ("likes", "Cathy"), col3 = c(NA, "tea"))

    > s_col
       col1  col2 col3
    1  Bob likes  tea
    2 aunt Cathy <NA>
    3 Tom  wins   twice

    > t_col<-data.frame(col1=c("Bob", NA, "likes", "tea", "Jack"), 
    col2=c("Cathy", "aunt", "Jason", "Bob", "likes"))

    > t_col
       col1  col2
    1   Bob Cathy
    2  <NA>  aunt
    3 likes Jason
    4   tea   Bob
    5  Jack likes

Desired results:

    #output for first row in s_col (Bob, likes, tea)

       col1  col2
    1   1    0
    2   0    0
    3   1    0
    4   1    1
    5   0    1 

    #output for 2nd row in s_col (aunt, Cathy, NA)

       col1  col2
    1   0    1
    2   0    1
    3   0    0
    4   0    0
    5   0    0

    #output for 3nd row in s_col (Tom, wins, twice)

       col1  col2
    1   0    0
    2   0    0
    3   0    0
    4   0    0
    5   0    0

So far this is the progress I have made but the code below is far from the desired results:

    out<-NULL
    output<-NULL
    for(i in 1:ncol(s_col)){
      x<-i
      for(j in 1:nrow(s_col)){
        y<-j 
        temp<- s_col[y,x]
        for(a in 1:ncol(t_col)){
          w<-a
           for(b in 1:nrow(t_col)){
            v<-b 
            temp2<- t_col[v,w]}}
         put<-ifelse(temp %in% temp2, 1, 0)
        out<-c(out,put)
      }
1
  • I think your s_col data.frame call have only 2 rows compared to the one you showed Commented Oct 29, 2017 at 4:07

1 Answer 1

2

We can loop through the rows of s_col, then use %in% to compare with the columns of 't_col' to create a list of logical matrices

lapply(seq_len(nrow(s_col)), function(i) +sapply(t_col, `%in%`, unlist(s_col[i,])))
#[[1]]
#     col1 col2
#[1,]    1    0
#[2,]    0    0
#[3,]    1    0
#[4,]    1    1
#[5,]    0    1

#[[2]]
#     col1 col2
#[1,]    0    1
#[2,]    1    1
#[3,]    0    0
#[4,]    0    0
#[5,]    0    0

#[[3]]
#     col1 col2
#[1,]    0    0
#[2,]    0    0
#[3,]    0    0
#[4,]    0    0
#[5,]    0    0

data

s_col<-data.frame(col1=c("Bob", "aunt", "Tom"), 
   col2= c("likes", "Cathy", "wins"), col3 = c("tea", NA, "twice"), stringsAsFactors=FALSE)
Sign up to request clarification or add additional context in comments.

1 Comment

I came close to it with ifelse(apply(t_col[], c(1,2), function(r) any(r %in% temp)) == TRUE, 1, 0) but yours is more succinct. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.