0

Disclaimer: I think there is a much more efficient solution (perhaps an anonymous function with a list or *apply functions?) hence why I have come to you much more experienced people for help!

The data

Let's say I have a df with participant responses to 3 question As and 3 question Bs e.g.

qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3 

EDIT df also contains other columns with other irrelevant data!

I have a vector with correct answers to each of qa1-3 and qb1-3 in sequence with the columns.

correct_answer <- c(1,3,2,2,1,4) 

(i.e. for qa1,qa2,qa3,qb1,qb2,qb3)

Desired manipulation

I want to create a new column per question (e.g. qa1_correct), coding for whether the participant has responded correctly (1) or incorrectly (0) based on matching each response in df with corresponding answer in correct_answer. Ideally I would end up with:

qa1, qa2, qa3, qb1, qb2, qb3, qa1_correct, qa2_correct, qa3_correct ...     
1, 3, 1, 2, 4, 4, 1, 1, 0, ...   
1, 3, 2, 2, 1, 4, 1, 1, 1, ...   
2, 3, 1, 2, 1, 4, 0, 1, 0, ...   
1, 3, 2, 1, 1, 3, 1, 1, 1, ... 

Failed Attempt

This is my attempt for question As only (would repeat for Bs) but it doesn't work (maybe wrong function paste0()?):

index <- c(1:3)  
    

    for (i in index) {
    df <- df %>% mutate(paste0("qa",i,"_correct") = 
                               case_when(paste0("qa"i) == correct_answer[i] ~ 1, 
                                         paste0("qa"i) != correct_answer[i] ~ 0))
    }

Many thanks for any guidance!

1
  • Is a solution without mutate() an option? Commented Jul 23, 2021 at 14:49

5 Answers 5

2

You can combine mutate and across.

Code 1: Correct_answer as vector

df  %>%
  mutate(across(everything(),
                ~as.numeric(.x == correct_answer[names(df) == cur_column()]),
                .names = "{.col}_correct"))

Code 2: Correct_answer as data.frame (df_correct)

correct_answer <- c(1,3,2,2,1,4) 
df_correct <- data.frame(
  matrix(correct_answer, ncol = length(correct_answer))
)
colnames(df_correct) <- names(df)

df  %>%
  mutate(across(everything(),
                .fn = ~as.numeric(.x == df_correct[,cur_column()]),
                .names = "{.col}_correct"))

Output

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct qb2_correct qb3_correct
1   1   3   1   2   4   4           1           1           0           1           0           1
2   1   3   2   2   1   4           1           1           1           1           1           1
3   2   3   1   2   1   4           0           1           0           1           1           1
4   1   3   2   1   1   3           1           1           1           0           1           0
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! If I had other columns with totally different named variables, could I replace everything() with e.g. select(starts_with("q"))?
You won't need select, just replace everything() with starts_with("q"). df %>% mutate(across(starts_with("qa"), ~as.numeric(.x == correct_answer[names(df) == cur_column()]), .names = "{.col}_correct"))
2

This may also be an alternative (In R version 4.1.0 onwards that has made apply gain a new argument simplify with default TRUE)

df <- read.table(header = T, text = 'qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3', sep = ',')

df
#>   qa1 qa2 qa3 qb1 qb2 qb3
#> 1   1   3   1   2   4   4
#> 2   1   3   2   2   1   4
#> 3   2   3   1   2   1   4
#> 4   1   3   2   1   1   3

correct_answer <- c(1,3,2,2,1,4)

cbind(df, 
      setNames(as.data.frame(t(apply(df, 1, 
                                     \(x) +(x == correct_answer)))), 
               paste0(names(df), '_correct')))
#>   qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct
#> 1   1   3   1   2   4   4           1           1           0           1
#> 2   1   3   2   2   1   4           1           1           1           1
#> 3   2   3   1   2   1   4           0           1           0           1
#> 4   1   3   2   1   1   3           1           1           1           0
#>   qb2_correct qb3_correct
#> 1           0           1
#> 2           1           1
#> 3           1           1
#> 4           1           0

Created on 2021-07-23 by the reprex package (v2.0.0)

Comments

2

You can also use the following solution in base R:

cbind(df, 
      do.call(cbind, mapply(function(x, y) as.data.frame({+(x == y)}), 
                            df, correct_answer, SIMPLIFY = FALSE)) |>
        setNames(paste0(names(df), "_corr")))

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

Or a potential tidyverse solution could be:

library(tidyr)
library(purrr)

df %>%
  mutate(output = pmap(df, ~ setNames(+(c(...) == correct_answer), 
                                             paste0(names(df), "_corr")))) %>%
  unnest_wider(output)

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

1 Comment

thank you very much! how could I adapt for when df contains other column variables aside from qa/qb's?
0

Try this:

df_new <- cbind(df, t(apply(df, 1, function(x) as.numeric(x == correct_answer))))

2 Comments

no, that didn't work - just generated blank columns with 0's
@CocoNewton, which R version are you using?
0

EDIT works with addition of sym()
Found a related solution here Paste variable name in mutate (dplyr) but it only pastes 0's

for (i in index) {
df <- df %>% mutate( !!paste0("qa",i,"_correct") :=
case_when(!!sym(paste0("qa",i)) == correct_answer[i] ~ 1,
!!sym(paste0("qa",i)) != correct_answer[i] ~ 0))
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.