Using R dplyr::mutate() with a for loop and dynamic variables

Question

Disclaimer: I think there is a much more efficient solution (perhaps an anonymous function with a list or *apply functions?) hence why I have come to you much more experienced people for help!

The data

Let's say I have a df with participant responses to 3 question As and 3 question Bs e.g.

qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3

EDIT df also contains other columns with other irrelevant data!

I have a vector with correct answers to each of qa1-3 and qb1-3 in sequence with the columns.

correct_answer <- c(1,3,2,2,1,4)

(i.e. for qa1,qa2,qa3,qb1,qb2,qb3)

Desired manipulation

I want to create a new column per question (e.g. qa1_correct), coding for whether the participant has responded correctly (1) or incorrectly (0) based on matching each response in df with corresponding answer in correct_answer. Ideally I would end up with:

qa1, qa2, qa3, qb1, qb2, qb3, qa1_correct, qa2_correct, qa3_correct ...     
1, 3, 1, 2, 4, 4, 1, 1, 0, ...   
1, 3, 2, 2, 1, 4, 1, 1, 1, ...   
2, 3, 1, 2, 1, 4, 0, 1, 0, ...   
1, 3, 2, 1, 1, 3, 1, 1, 1, ...

Failed Attempt

This is my attempt for question As only (would repeat for Bs) but it doesn't work (maybe wrong function paste0()?):

index <- c(1:3)  
    

    for (i in index) {
    df <- df %>% mutate(paste0("qa",i,"_correct") = 
                               case_when(paste0("qa"i) == correct_answer[i] ~ 1, 
                                         paste0("qa"i) != correct_answer[i] ~ 0))
    }

Many thanks for any guidance!

Is a solution without mutate() an option?

KacZdr
– KacZdr

2021-07-23 14:49:35 +00:00
Commented Jul 23, 2021 at 14:49 — KacZdr
– KacZdr, Commented Jul 23, 2021 at 14:49

tamtam · Accepted Answer · 2021-07-23 15:28:10Z

2

You can combine mutate and across.

Code 1: Correct_answer as vector

df  %>%
  mutate(across(everything(),
                ~as.numeric(.x == correct_answer[names(df) == cur_column()]),
                .names = "{.col}_correct"))

Code 2: Correct_answer as data.frame (df_correct)

correct_answer <- c(1,3,2,2,1,4) 
df_correct <- data.frame(
  matrix(correct_answer, ncol = length(correct_answer))
)
colnames(df_correct) <- names(df)

df  %>%
  mutate(across(everything(),
                .fn = ~as.numeric(.x == df_correct[,cur_column()]),
                .names = "{.col}_correct"))

Output

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct qb2_correct qb3_correct
1   1   3   1   2   4   4           1           1           0           1           0           1
2   1   3   2   2   1   4           1           1           1           1           1           1
3   2   3   1   2   1   4           0           1           0           1           1           1
4   1   3   2   1   1   3           1           1           1           0           1           0

answered Jul 23, 2021 at 15:28

tamtam

3,7011 gold badge9 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Coco Newton Over a year ago

Thanks! If I had other columns with totally different named variables, could I replace everything() with e.g. select(starts_with("q"))?

tamtam Over a year ago

You won't need select, just replace everything() with starts_with("q").

df  %>%  mutate(across(starts_with("qa"), ~as.numeric(.x == correct_answer[names(df) == cur_column()]), .names = "{.col}_correct"))

AnilGoyal · Accepted Answer · 2021-07-23 16:06:03Z

This may also be an alternative (In R version 4.1.0 onwards that has made apply gain a new argument simplify with default TRUE)

df <- read.table(header = T, text = 'qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3', sep = ',')

df
#>   qa1 qa2 qa3 qb1 qb2 qb3
#> 1   1   3   1   2   4   4
#> 2   1   3   2   2   1   4
#> 3   2   3   1   2   1   4
#> 4   1   3   2   1   1   3

correct_answer <- c(1,3,2,2,1,4)

cbind(df, 
      setNames(as.data.frame(t(apply(df, 1, 
                                     \(x) +(x == correct_answer)))), 
               paste0(names(df), '_correct')))
#>   qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct
#> 1   1   3   1   2   4   4           1           1           0           1
#> 2   1   3   2   2   1   4           1           1           1           1
#> 3   2   3   1   2   1   4           0           1           0           1
#> 4   1   3   2   1   1   3           1           1           1           0
#>   qb2_correct qb3_correct
#> 1           0           1
#> 2           1           1
#> 3           1           1
#> 4           1           0

^{Created on 2021-07-23 by the reprex package (v2.0.0)}

Anoushiravan R · Accepted Answer · 2021-07-23 16:41:12Z

2

You can also use the following solution in base R:

cbind(df, 
      do.call(cbind, mapply(function(x, y) as.data.frame({+(x == y)}), 
                            df, correct_answer, SIMPLIFY = FALSE)) |>
        setNames(paste0(names(df), "_corr")))

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

Or a potential tidyverse solution could be:

library(tidyr)
library(purrr)

df %>%
  mutate(output = pmap(df, ~ setNames(+(c(...) == correct_answer), 
                                             paste0(names(df), "_corr")))) %>%
  unnest_wider(output)

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

edited Jul 23, 2021 at 16:41

answered Jul 23, 2021 at 15:34

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

1 Comment

Coco Newton Over a year ago

thank you very much! how could I adapt for when df contains other column variables aside from qa/qb's?

Mohanasundaram · Accepted Answer · 2021-07-23 13:53:32Z

0

Try this:

df_new <- cbind(df, t(apply(df, 1, function(x) as.numeric(x == correct_answer))))

answered Jul 23, 2021 at 13:53

Mohanasundaram

2,9591 gold badge10 silver badges19 bronze badges

2 Comments

Coco Newton Over a year ago

no, that didn't work - just generated blank columns with 0's

AnilGoyal Over a year ago

@CocoNewton, which R version are you using?

Coco Newton · Accepted Answer · 2021-07-23 15:00:47Z

0

EDIT works with addition of sym()
Found a related solution here Paste variable name in mutate (dplyr) but it only pastes 0's

for (i in index) {
df <- df %>% mutate( !!paste0("qa",i,"_correct") :=
case_when(!!sym(paste0("qa",i)) == correct_answer[i] ~ 1,
!!sym(paste0("qa",i)) != correct_answer[i] ~ 0))
}

edited Jul 23, 2021 at 15:00

answered Jul 23, 2021 at 13:48

Coco Newton

692 silver badges7 bronze badges

Collectives™ on Stack Overflow

Using R dplyr::mutate() with a for loop and dynamic variables

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related