1

I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing.

Here is an example of my data:

org_dat <- read_table('ID   Q2_1.x  Q2_2.x  Q2_1.y  Q2_2.y  Q14_1.x Q14_1.y Q15
1   Yes NA  NA  NA  Sometimes   NA  NA
2   -99 NA  No  NA  NA  Always  Yes
3   NA  NA  NA  NA  NA  NA  NA
4   NA  NA  NA  No  NA  NA  No 
5   NA  NA  NA  NA  NA  Always  NA
6   NA  NA  NA  No  NA  NA  NA') %>% mutate_all(as.character)

Here is my desired output:

dat_out <- read_table('ID   Q2_1    Q2_2    Q14_1   Q15
1   Yes NA  Sometimes   NA
2   No  NA  Always  Yes
3   NA  NA  NA  NA
4   NA  No  NA  No
5   NA  NA  Always  NA
6   NA  No  NA  NA')

Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.

org_dat %>% mutate(Q2_1.x = case_when(is.na(Q2_1.x) ~ Q2_1.y,
                                TRUE ~ Q2_1.x)) %>% 
       mutate(Q2_2.x = case_when(is.na(Q2_2.x) ~ Q2_2.y,
                                TRUE ~ Q2_2.x)) %>%
  mutate(Q14_1.x = case_when(is.na(Q14_1.x) ~ Q14_1.y,
                            TRUE ~ Q14_1.x)) %>%
  rename(Q2_1 = Q2_1.x,
         Q2_2 = Q2_2.x,
         Q14_1 = Q14_1.x) %>%
  select(-matches("x|y"))

1 Answer 1

1

Here is an option with across and coalesce, loop across the columns that ends_with 'x', replace (str_replace) the substring in column name (cur_column()) from 'x' to 'y', get the column value, do coalesce with the looped column, and subsequently, remove the substring from column name in .names

library(dplyr)
library(stringr)
org_dat %>% 
    mutate(across(ends_with("x"),
     ~ coalesce(., get(str_replace(cur_column(), "x", "y"))),
        .names = "{str_remove(.col, '.x')}"), .keep = "unused", .before = 2)

-output

# A tibble: 6 × 5
  ID    Q2_1  Q2_2  Q14_1     Q15  
  <chr> <chr> <chr> <chr>     <chr>
1 1     Yes   <NA>  Sometimes <NA> 
2 2     No    <NA>  Always    Yes  
3 3     <NA>  <NA>  <NA>      <NA> 
4 4     <NA>  No    <NA>      No   
5 5     <NA>  <NA>  Always    <NA> 
6 6     <NA>  No    <NA>      <NA> 
Sign up to request clarification or add additional context in comments.

15 Comments

wow-- this is so elegant. How does coalesce know to only look at those matching columns (Q2_1.x and Q2_1.y)?
@NewBee the loop is only looping across the ".x" columns. Inside the loop, the column name (cur_column()) .x is changed to .y, and get the corresponding value of that column
TY! also is there a way to return all the columns that are not touched by the loop without explicitly calling them (Q15). I have a lot of columns like this that should return as they are..
@NewBee updated the post
@NewBee I think the mutate step with .keep and .before should fix that issue as in the update
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.