2

I have a dataframe with missing data for some rows. The correct data can be found in another set of columns. I want to replace the NAs with the correct data.

My data looks like this:

df <- data.frame(M_1=c(1,NA,3,NA,6),
           M_2=c(5,NA,3,NA,1),
           M_3=c(6,NA,2,NA,4),
           M_C_1=c(NA,2,NA,6,NA),
           M_C_2=c(NA,1,NA,4,NA),
           M_C_3=c(NA,7,NA,3,NA))
df
#   M_1 M_2 M_3 M_C_1 M_C_2 M_C_3
#1   1   5   6    NA    NA    NA
#2  NA  NA  NA     2     1     7
#3   3   3   2    NA    NA    NA
#4  NA  NA  NA     6     4     3
#5   6   1   4    NA    NA    NA

For all records, I either have a complete set of records for variables M_1, M_2, and M_3 or I have a complete set for variables M_C_1, M_C_2, and M_C_3.

For each row that has NAs in the first set of variables (M_1:M_3), I would like to replace with the values from the second set of values (M_C_1:M_C_2). I dont need to retain the second set of values. So my desired data frame would look like:

df
#   M_1 M_2 M_3
#1   1   5   6    
#2   2   1   7
#3   3   3   2
#4   6   4   3     
#5   6   1   4

My real dataset contains many columns in this notation, so I need a general solutions (ie, I dont want to refer to each column individually).

I would like to do this with dplyr if possible.

3 Answers 3

3

You could use map2 + coalesce:

library(dplyr)
library(purrr)

map2_dfc(select(df, 1:3), select(df, 4:6), coalesce)

# # A tibble: 5 × 3
#     M_1   M_2   M_3
#   <dbl> <dbl> <dbl>
# 1     1     5     6
# 2     2     1     7
# 3     3     3     2
# 4     6     4     3
# 5     6     1     4
Sign up to request clarification or add additional context in comments.

2 Comments

This is brilliant and works perfectly. I struggle to understand what the map2_dfc function does though. Can you explain what is happening?
@mob map2 applies a function pairwisely to 2 list or vector arguments. You can take a look at map2(1:3, 4:6, paste). The output of map2 is always a list. You can use map2_xxx to convert output type. E.g. map2_chr(1:3, 4:6, paste) reduce the list to a character vector.
2

Here is another option with dplyover

library(dplyover)
library(stringr)
df %>%
   transmute(across2(1:3, 4:6, coalesce, 
         .names_fn = ~ str_remove(.x, "(?<=\\d)_.*") ))
  M_1 M_2 M_3
1   1   5   6
2   2   1   7
3   3   3   2
4   6   4   3
5   6   1   4

Comments

0

Here's a generic example that works if the columns have names that allow them to be identified and they are in the correct order.

library(dplyr)

df <- data.frame(M_1=c(1,NA,3,NA,6),
                 M_2=c(5,NA,3,NA,1),
                 M_3=c(6,NA,2,NA,4),
                 M_C_1=c(NA,2,NA,6,NA),
                 M_C_2=c(NA,1,NA,4,NA),
                 M_C_3=c(NA,7,NA,3,NA))

# make a temporary id so we can retain the order later
df <- df %>% mutate(temporary_id = 1:n())

# find the columns corresponding to the final data
# they are assumed to be of the form M_number
df_records <- 
    df %>% 
    select(matches('temporary_id|M_[0-9]+')) %>% 
    na.omit()

# find the extra columns with data to replace in the final data
# they are assumed to be of the form M_C_number
df_extra <- 
    df %>% 
    select(matches('temporary_id|M_C_[0-9]+')) %>% 
    na.omit()

# change the names of the extra columns to match the final data
# this only works if the columns are in the correct order in the original data frame
names(df_extra) <- names(df_records)

# bind the rows of the final and extra data, sort and remove the temporary id
final_df <- 
    df_records %>% 
    bind_rows(df_extra) %>% 
    arrange(temporary_id) %>% 
    select(-temporary_id)

final_df
#  M_1 M_2 M_3
#1   1   5   6
#2   2   1   7
#3   3   3   2
#4   6   4   3
#5   6   1   4

If they are not in the required order, some sorting could be done but I'll leave that for now.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.