replace one column with another using regex matching in R

Question

I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing.

Here is an example of my data:

org_dat <- read_table('ID   Q2_1.x  Q2_2.x  Q2_1.y  Q2_2.y  Q14_1.x Q14_1.y Q15
1   Yes NA  NA  NA  Sometimes   NA  NA
2   -99 NA  No  NA  NA  Always  Yes
3   NA  NA  NA  NA  NA  NA  NA
4   NA  NA  NA  No  NA  NA  No 
5   NA  NA  NA  NA  NA  Always  NA
6   NA  NA  NA  No  NA  NA  NA') %>% mutate_all(as.character)

Here is my desired output:

dat_out <- read_table('ID   Q2_1    Q2_2    Q14_1   Q15
1   Yes NA  Sometimes   NA
2   No  NA  Always  Yes
3   NA  NA  NA  NA
4   NA  No  NA  No
5   NA  NA  Always  NA
6   NA  No  NA  NA')

Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.

org_dat %>% mutate(Q2_1.x = case_when(is.na(Q2_1.x) ~ Q2_1.y,
                                TRUE ~ Q2_1.x)) %>% 
       mutate(Q2_2.x = case_when(is.na(Q2_2.x) ~ Q2_2.y,
                                TRUE ~ Q2_2.x)) %>%
  mutate(Q14_1.x = case_when(is.na(Q14_1.x) ~ Q14_1.y,
                            TRUE ~ Q14_1.x)) %>%
  rename(Q2_1 = Q2_1.x,
         Q2_2 = Q2_2.x,
         Q14_1 = Q14_1.x) %>%
  select(-matches("x|y"))

akrun · Accepted Answer · 2022-01-26 18:42:33Z

1

Here is an option with across and coalesce, loop across the columns that ends_with 'x', replace (str_replace) the substring in column name (cur_column()) from 'x' to 'y', get the column value, do coalesce with the looped column, and subsequently, remove the substring from column name in .names

library(dplyr)
library(stringr)
org_dat %>% 
    mutate(across(ends_with("x"),
     ~ coalesce(., get(str_replace(cur_column(), "x", "y"))),
        .names = "{str_remove(.col, '.x')}"), .keep = "unused", .before = 2)

-output

# A tibble: 6 × 5
  ID    Q2_1  Q2_2  Q14_1     Q15  
  <chr> <chr> <chr> <chr>     <chr>
1 1     Yes   <NA>  Sometimes <NA> 
2 2     No    <NA>  Always    Yes  
3 3     <NA>  <NA>  <NA>      <NA> 
4 4     <NA>  No    <NA>      No   
5 5     <NA>  <NA>  Always    <NA> 
6 6     <NA>  No    <NA>      <NA>

edited Jan 26, 2022 at 18:42

answered Jan 26, 2022 at 18:20

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

NewBee Over a year ago

wow-- this is so elegant. How does coalesce know to only look at those matching columns (Q2_1.x and Q2_1.y)?

akrun Over a year ago

@NewBee the loop is only looping across the ".x" columns. Inside the loop, the column name (cur_column()) .x is changed to .y, and get the corresponding value of that column

NewBee Over a year ago

TY! also is there a way to return all the columns that are not touched by the loop without explicitly calling them (Q15). I have a lot of columns like this that should return as they are..

akrun Over a year ago

@NewBee updated the post

akrun Over a year ago

@NewBee I think the mutate step with .keep and .before should fix that issue as in the update

|

Collectives™ on Stack Overflow

replace one column with another using regex matching in R

1 Answer 1

15 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Your Answer

Sign up or log in

Post as a guest

Related