imputing missing values in R dataframe

Question

I am trying to impute missing values in my dataset by matching against values in another dataset.

This is my data:

df1 %>% head()
           
   <V1>       <V2>     
1  apple       NA 
2  cheese      NA        
3  butter      NA               
 
df2 %>% head()
           
   <V1>      <V2>     
1  apple     jacks           
2  cheese    whiz      
3  butter    scotch
4  apple     turnover           
5  cheese    sliders      
6  butter    chicken
7  apple     sauce           
8  cheese    doodles      
9  butter    milk

This is what I want df1 to look like:

   <V1>       <V2>     
1  apple      jacks, turnover, sauce
2  cheese     whiz, sliders, doodles        
3  butter     scotch, chicken, milk

This is my code:

df1$V2[is.na(df1$V2)] <- df2$V2[match(df1$V1,df2$V1)][which(is.na(df1$V2))]

This code works fine, however it only pulls the first missing value and ignores the rest.

G5W · Accepted Answer · 2022-06-15 22:44:40Z

1

Another solution just using base R

aggregate(DF2$V2, list(DF2$V1), c, simplify=F)
  Group.1                      x
1   apple jacks, turnover, sauce
2  butter  scotch, chicken, milk
3  cheese whiz, sliders, doodles

answered Jun 15, 2022 at 22:44

G5W

37.8k10 gold badges57 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BEVAN · Accepted Answer · 2022-06-15 22:38:51Z

1

I don't think you even need to import the df1 in this case can do it all based on df2

df1 <- df2 %>% group_by(`<V1>`) %>% summarise(`<V2>`=paste0(`<V2>`, collapse = ", "))

answered Jun 15, 2022 at 22:38

BEVAN

7474 silver badges14 bronze badges

Collectives™ on Stack Overflow

imputing missing values in R dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related