Replace values based on matching REGEX with another column r tidy

Question

I would like to replace values of a dataset with corrected values if those corrected are not NA.

df <- tibble(
  id = c(1,2,3),
  name = c("peter", "piper", "paul"),
  alex.value = c("apple","banana","apple"),
  alex.corrected = c("orange",NA,"banana"),
  bob.value = c("monkey","lion","tiger"),
  bob.corrected = c("lion","tiger", NA)
)

Desired output

df %>%
  mutate(
    alex = if_else(!is.na(alex.corrected), alex.corrected,
                                      alex.value),
    bob = if_else(!is.na(bob.corrected), bob.corrected,
                                      bob.value),
  )

I need to do this for many columns, so it would be great to have a solution that scales. I'm thinking it will involve REGEX and maybe purrr, something like

df %>%
map_df( str_detect(unique(*\\.)

but that is just a wild guess

akrun · Accepted Answer · 2020-07-01 22:35:11Z

2

We can use pivot_longer to split the column names at the delimiter ., then transmute by coalesceing the 'corrected' with 'value', reshape back to 'wide' format and bind with the original dataset

library(dplyr)
library(tidyr)
library(data.table)
df %>%
   select(matches("value|corrected"))  %>%
   pivot_longer(cols = everything(), names_sep="\\.",
          names_to = c("grp", '.value')) %>%
   transmute(grp, value = coalesce(corrected, value))%>% 
   mutate(rn = rowid(grp)) %>% 
   pivot_wider(names_from = grp, values_from = value) %>% 
   select(-rn) %>% 
   bind_cols(df, .)
# A tibble: 3 x 8
#     id name  alex.value alex.corrected bob.value bob.corrected alex   bob  
#  <dbl> <chr> <chr>      <chr>          <chr>     <chr>         <chr>  <chr>
#1     1 peter apple      orange         monkey    lion          orange lion 
#2     2 piper banana     <NA>           lion      tiger         banana tiger
#3     3 paul  apple      banana         tiger     <NA>          banana tiger

Or in base R with split.default

nm1 <- grep('value|corrected', names(df), value = TRUE)
cbind(df, lapply(split.default(df[nm1], sub("\\..*", "", nm1)), 
          function(x)  ifelse(is.na(x[[2]]), x[[1]], x[[2]])))
#  id  name alex.value alex.corrected bob.value bob.corrected   alex   bob
#1  1 peter      apple         orange    monkey          lion orange  lion
#2  2 piper     banana           <NA>      lion         tiger banana tiger
#3  3  paul      apple         banana     tiger          <NA> banana tiger

edited Jul 1, 2020 at 22:35

answered Jul 1, 2020 at 22:19

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nick Over a year ago

Thank you! I have one additional part my question, which I'll edit now. There are some values that are not meant to be changed (e.g., id, name, etc.).

akrun Over a year ago

@Nick In that case, just subset the data with only the columns that need to be changed i.e. df %>% select(matches("value|corrected")) %>% pivot_longer(cols = everything(), ...

Ronak Shah · Accepted Answer · 2020-07-02 00:58:22Z

1

You can divide the data by their column names and use the same logic as in your attempt with Map :

value_cols <- grep('value', names(df), value = TRUE)
corrected_cols <- grep('corrected', names(df), value = TRUE)
new_cols <- sub('\\..*', '', value_cols)

df[new_cols] <- Map(function(x, y) ifelse(!is.na(x), x, y), 
                    df[corrected_cols], df[value_cols])

If you prefer a tidyverse solution :

library(dplyr)
library(purrr)
df %>%
  bind_cols(map2_df(df[corrected_cols], df[value_cols], coalesce) %>%
  rename_with(~new_cols))
  #In old dplyr use rename_all
  #rename_all(~new_cols))


# A tibble: 3 x 8
#     id name  alex.value alex.corrected bob.value bob.corrected alex   bob  
#  <dbl> <chr> <chr>      <chr>          <chr>     <chr>         <chr>  <chr>
#1     1 peter apple      orange         monkey    lion          orange lion 
#2     2 piper banana     NA             lion      tiger         banana tiger
#3     3 paul  apple      banana         tiger     NA            banana tiger

answered Jul 2, 2020 at 0:58

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

3 Comments

Nick Over a year ago

I'm interested in this solution, because I'd like to perform the function only on subset of the dataframe while maintaining the entire dataframe. E.g. df %>% filter(id %in% c("1","3")) %>%... How could I do this?

Ronak Shah Over a year ago

@Nick newdf <- df %>% filter(id %in% c("1","3")) and then use the above solution with newdf ?

Nick Over a year ago

OK this is helpful. I split the dataframe into two then did bind_rows to join them again

Collectives™ on Stack Overflow

Replace values based on matching REGEX with another column r tidy

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related