I would like to find a tidy way to carry out a data cleaning step that I have to do for multiple pairs of columns.
df <- data.frame(apple = c("Yes", NA, NA, "Yes", NA),
apple_NO = c(NA, "No_1", "No_1", NA, "No_2"),
berry = c("Yes", "Yes", NA, NA, "Yes"),
berry_NO = c(NA, NA, "No_1", "No_1", NA),
coconut = c(NA, "Yes", "Yes", "Yes", NA),
coconut_NO = c("No_2", NA, NA, NA, "No_2"),
dinosaur = c("Yes", NA, NA, NA, "Yes"),
dinosaur_NO = c(NA, "No_2", "No_1", "No_2", NA))
> df
apple apple_NO berry berry_NO coconut coconut_NO dinosaur dinosaur_NO
1 Yes <NA> Yes <NA> <NA> No_2 Yes <NA>
2 <NA> No_1 Yes <NA> Yes <NA> <NA> No_2
3 <NA> No_1 <NA> No_1 Yes <NA> <NA> No_1
4 Yes <NA> <NA> No_1 Yes <NA> <NA> No_2
5 <NA> No_2 Yes <NA> <NA> No_2 Yes <NA>
cols <- c("apple", "berry", "coconut", "dinosaur")
cols_NO <- c("apple_NO", "berry_NO", "coconut_NO", "dinosaur_NO")
I would like to clean the values in columns in cols_NO and assign new values to the columns in cols
For example, if I just had one column pair to clean, I would do something like:
df <- df %>%
mutate(apple = case_when(apple_NO == "No_1" ~ "None left",
apple_NO == "No_2" ~ "Finished",
TRUE ~ apple))
I would also like to do this with berry and berry_NO, and coconut and coconut_NO etc.
The output I want would look something like this:
apple apple_NO berry berry_NO coconut coconut_NO dinosaur dinosaur_NO
1 Yes <NA> Yes <NA> Finished No_2 Yes <NA>
2 None left No_1 Yes <NA> Yes <NA> Finished No_2
3 None left No_1 None left No_1 Yes <NA> None left No_1
4 Yes <NA> None left No_1 Yes <NA> None left No_2
5 Finished No_2 Yes <NA> Finished No_2 Yes <NA>
I think there's a solution somewhere along the lines of using map or map2 or mapply and parallel lists, but I've not used those before and can't seem to find similar solutions that I can use, featuring a list of columns on the left and right hand side of the = in mutate.
Thanks!
EDIT:
This gets me close but I would still need to replace or mutate_at this to my main dataframe. My real data would benefit from using grepl so I've just left that in.
fun.casewhen <- function(cols, cols_NO){
case_when(grepl("No_1", cols_NO) == TRUE ~ "None left",
grepl("No_2", cols_NO) == TRUE ~ "Finished",
TRUE ~ cols)
}
dftest <- map2(df %>% select(cols), df1 %>% select(cols_NO), ~ fun.casewhen (.x, .y))
The resulting dftest is made up of lists of each of the columns in cols, but with the correct values.
