Replace multiple NA values with a correct values from a different set of columns

Question

I have a dataframe with missing data for some rows. The correct data can be found in another set of columns. I want to replace the NAs with the correct data.

My data looks like this:

df <- data.frame(M_1=c(1,NA,3,NA,6),
           M_2=c(5,NA,3,NA,1),
           M_3=c(6,NA,2,NA,4),
           M_C_1=c(NA,2,NA,6,NA),
           M_C_2=c(NA,1,NA,4,NA),
           M_C_3=c(NA,7,NA,3,NA))
df
#   M_1 M_2 M_3 M_C_1 M_C_2 M_C_3
#1   1   5   6    NA    NA    NA
#2  NA  NA  NA     2     1     7
#3   3   3   2    NA    NA    NA
#4  NA  NA  NA     6     4     3
#5   6   1   4    NA    NA    NA

For all records, I either have a complete set of records for variables M_1, M_2, and M_3 or I have a complete set for variables M_C_1, M_C_2, and M_C_3.

For each row that has NAs in the first set of variables (M_1:M_3), I would like to replace with the values from the second set of values (M_C_1:M_C_2). I dont need to retain the second set of values. So my desired data frame would look like:

df
#   M_1 M_2 M_3
#1   1   5   6    
#2   2   1   7
#3   3   3   2
#4   6   4   3     
#5   6   1   4

My real dataset contains many columns in this notation, so I need a general solutions (ie, I dont want to refer to each column individually).

I would like to do this with dplyr if possible.

Darren Tsai · Accepted Answer · 2022-07-28 12:22:32Z

3

You could use map2 + coalesce:

library(dplyr)
library(purrr)

map2_dfc(select(df, 1:3), select(df, 4:6), coalesce)

# # A tibble: 5 × 3
#     M_1   M_2   M_3
#   <dbl> <dbl> <dbl>
# 1     1     5     6
# 2     2     1     7
# 3     3     3     2
# 4     6     4     3
# 5     6     1     4

answered Jul 28, 2022 at 12:22

Darren Tsai

36.6k6 gold badges27 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mob Over a year ago

This is brilliant and works perfectly. I struggle to understand what the map2_dfc function does though. Can you explain what is happening?

Darren Tsai Over a year ago

@mob map2 applies a function pairwisely to 2 list or vector arguments. You can take a look at map2(1:3, 4:6, paste). The output of map2 is always a list. You can use map2_xxx to convert output type. E.g. map2_chr(1:3, 4:6, paste) reduce the list to a character vector.

akrun · Accepted Answer · 2022-07-28 17:02:55Z

2

Here is another option with dplyover

library(dplyover)
library(stringr)
df %>%
   transmute(across2(1:3, 4:6, coalesce, 
         .names_fn = ~ str_remove(.x, "(?<=\\d)_.*") ))
  M_1 M_2 M_3
1   1   5   6
2   2   1   7
3   3   3   2
4   6   4   3
5   6   1   4

answered Jul 28, 2022 at 17:02

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

Andrew Chisholm · Accepted Answer · 2022-07-28 11:54:26Z

Here's a generic example that works if the columns have names that allow them to be identified and they are in the correct order.

library(dplyr)

df <- data.frame(M_1=c(1,NA,3,NA,6),
                 M_2=c(5,NA,3,NA,1),
                 M_3=c(6,NA,2,NA,4),
                 M_C_1=c(NA,2,NA,6,NA),
                 M_C_2=c(NA,1,NA,4,NA),
                 M_C_3=c(NA,7,NA,3,NA))

# make a temporary id so we can retain the order later
df <- df %>% mutate(temporary_id = 1:n())

# find the columns corresponding to the final data
# they are assumed to be of the form M_number
df_records <- 
    df %>% 
    select(matches('temporary_id|M_[0-9]+')) %>% 
    na.omit()

# find the extra columns with data to replace in the final data
# they are assumed to be of the form M_C_number
df_extra <- 
    df %>% 
    select(matches('temporary_id|M_C_[0-9]+')) %>% 
    na.omit()

# change the names of the extra columns to match the final data
# this only works if the columns are in the correct order in the original data frame
names(df_extra) <- names(df_records)

# bind the rows of the final and extra data, sort and remove the temporary id
final_df <- 
    df_records %>% 
    bind_rows(df_extra) %>% 
    arrange(temporary_id) %>% 
    select(-temporary_id)

final_df
#  M_1 M_2 M_3
#1   1   5   6
#2   2   1   7
#3   3   3   2
#4   6   4   3
#5   6   1   4

If they are not in the required order, some sorting could be done but I'll leave that for now.

Collectives™ on Stack Overflow

Replace multiple NA values with a correct values from a different set of columns

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related