1

I would like to replace values of a dataset with corrected values if those corrected are not NA.

df <- tibble(
  id = c(1,2,3),
  name = c("peter", "piper", "paul"),
  alex.value = c("apple","banana","apple"),
  alex.corrected = c("orange",NA,"banana"),
  bob.value = c("monkey","lion","tiger"),
  bob.corrected = c("lion","tiger", NA)
)

Desired output

df %>%
  mutate(
    alex = if_else(!is.na(alex.corrected), alex.corrected,
                                      alex.value),
    bob = if_else(!is.na(bob.corrected), bob.corrected,
                                      bob.value),
  )

I need to do this for many columns, so it would be great to have a solution that scales. I'm thinking it will involve REGEX and maybe purrr, something like

df %>%
map_df( str_detect(unique(*\\.)

but that is just a wild guess

2 Answers 2

2

We can use pivot_longer to split the column names at the delimiter ., then transmute by coalesceing the 'corrected' with 'value', reshape back to 'wide' format and bind with the original dataset

library(dplyr)
library(tidyr)
library(data.table)
df %>%
   select(matches("value|corrected"))  %>%
   pivot_longer(cols = everything(), names_sep="\\.",
          names_to = c("grp", '.value')) %>%
   transmute(grp, value = coalesce(corrected, value))%>% 
   mutate(rn = rowid(grp)) %>% 
   pivot_wider(names_from = grp, values_from = value) %>% 
   select(-rn) %>% 
   bind_cols(df, .)
# A tibble: 3 x 8
#     id name  alex.value alex.corrected bob.value bob.corrected alex   bob  
#  <dbl> <chr> <chr>      <chr>          <chr>     <chr>         <chr>  <chr>
#1     1 peter apple      orange         monkey    lion          orange lion 
#2     2 piper banana     <NA>           lion      tiger         banana tiger
#3     3 paul  apple      banana         tiger     <NA>          banana tiger

Or in base R with split.default

nm1 <- grep('value|corrected', names(df), value = TRUE)
cbind(df, lapply(split.default(df[nm1], sub("\\..*", "", nm1)), 
          function(x)  ifelse(is.na(x[[2]]), x[[1]], x[[2]])))
#  id  name alex.value alex.corrected bob.value bob.corrected   alex   bob
#1  1 peter      apple         orange    monkey          lion orange  lion
#2  2 piper     banana           <NA>      lion         tiger banana tiger
#3  3  paul      apple         banana     tiger          <NA> banana tiger
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! I have one additional part my question, which I'll edit now. There are some values that are not meant to be changed (e.g., id, name, etc.).
@Nick In that case, just subset the data with only the columns that need to be changed i.e. df %>% select(matches("value|corrected")) %>% pivot_longer(cols = everything(), ...
1

You can divide the data by their column names and use the same logic as in your attempt with Map :

value_cols <- grep('value', names(df), value = TRUE)
corrected_cols <- grep('corrected', names(df), value = TRUE)
new_cols <- sub('\\..*', '', value_cols)

df[new_cols] <- Map(function(x, y) ifelse(!is.na(x), x, y), 
                    df[corrected_cols], df[value_cols])

If you prefer a tidyverse solution :

library(dplyr)
library(purrr)
df %>%
  bind_cols(map2_df(df[corrected_cols], df[value_cols], coalesce) %>%
  rename_with(~new_cols))
  #In old dplyr use rename_all
  #rename_all(~new_cols))


# A tibble: 3 x 8
#     id name  alex.value alex.corrected bob.value bob.corrected alex   bob  
#  <dbl> <chr> <chr>      <chr>          <chr>     <chr>         <chr>  <chr>
#1     1 peter apple      orange         monkey    lion          orange lion 
#2     2 piper banana     NA             lion      tiger         banana tiger
#3     3 paul  apple      banana         tiger     NA            banana tiger

3 Comments

I'm interested in this solution, because I'd like to perform the function only on subset of the dataframe while maintaining the entire dataframe. E.g. df %>% filter(id %in% c("1","3")) %>%... How could I do this?
@Nick newdf <- df %>% filter(id %in% c("1","3")) and then use the above solution with newdf ?
OK this is helpful. I split the dataframe into two then did bind_rows to join them again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.