2

I am trying to impute missing values in my dataset by matching against values in another dataset.

This is my data:

df1 %>% head()
           
   <V1>       <V2>     
1  apple       NA 
2  cheese      NA        
3  butter      NA               
 
df2 %>% head()
           
   <V1>      <V2>     
1  apple     jacks           
2  cheese    whiz      
3  butter    scotch
4  apple     turnover           
5  cheese    sliders      
6  butter    chicken
7  apple     sauce           
8  cheese    doodles      
9  butter    milk

This is what I want df1 to look like:

   <V1>       <V2>     
1  apple      jacks, turnover, sauce
2  cheese     whiz, sliders, doodles        
3  butter     scotch, chicken, milk 

This is my code:

df1$V2[is.na(df1$V2)] <- df2$V2[match(df1$V1,df2$V1)][which(is.na(df1$V2))]

This code works fine, however it only pulls the first missing value and ignores the rest.

2 Answers 2

1

Another solution just using base R

aggregate(DF2$V2, list(DF2$V1), c, simplify=F)
  Group.1                      x
1   apple jacks, turnover, sauce
2  butter  scotch, chicken, milk
3  cheese whiz, sliders, doodles
Sign up to request clarification or add additional context in comments.

Comments

1

I don't think you even need to import the df1 in this case can do it all based on df2

df1 <- df2 %>% group_by(`<V1>`) %>% summarise(`<V2>`=paste0(`<V2>`, collapse = ", "))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.