3

I'm trying to use str_detect and case_when to recode strings based on multiple patterns, and paste each occurance of the recoded value(s) into a new column. The Correct column is the output I'm trying to achieve.

This is similar to this question and this question If it can't be done with case_when (limited to one pattern I think) is there a better way this can be achieved still using tidyverse?

Fruit=c("Apples","apples, maybe bananas","Oranges","grapes w apples","pears")
Num=c(1,2,3,4,5)
data=data.frame(Num,Fruit)

df= data %>% mutate(Incorrect=
paste(case_when(
  str_detect(Fruit, regex("apples", ignore_case=TRUE)) ~ "good",
  str_detect(Fruit, regex("bananas", ignore_case=TRUE)) ~ "gross",
  str_detect(Fruit, regex("grapes | oranges", ignore_case=TRUE)) ~ "ok",
  str_detect(Fruit, regex("lemon", ignore_case=TRUE)) ~ "sour",
  TRUE ~ "other"
),sep=","))

  Num                 Fruit Incorrect
  1                Apples      good
  2 apples, maybe bananas      good
  3               Oranges      other
  4       grapes w apples      good
  5                pears       other

 Num                 Fruit    Correct
   1                Apples       good
   2 apples, maybe bananas good,gross
   3               Oranges         ok
   4       grapes w apples    ok,good
   5                pears       other
1

1 Answer 1

6

In case_when if a condition is satisfied for one row it stops there and doesn't check for any more conditions. So usually in such cases it is better to have every entry in separate row so that it easier to assign value and then summarise all of them together. However, in this case Fruit column does not have a clear separator, some fruits are separated by comma (,), some are with whitespace and also there are additional words between them. To handle all such cases we assign NA to the words which do not match and then remove them during summarising.

library(dplyr)
library(stringr)

data %>%
  tidyr::separate_rows(Fruit, sep = ",|\\s+") %>%
   mutate(Correct = case_when(
      str_detect(Fruit, regex("apples", ignore_case=TRUE)) ~ "good",
      str_detect(Fruit, regex("bananas", ignore_case=TRUE)) ~ "gross",
      str_detect(Fruit, regex("grapes|oranges", ignore_case=TRUE)) ~ "ok",
      str_detect(Fruit, regex("lemon", ignore_case=TRUE)) ~ "sour",
      TRUE ~ NA_character_)) %>% 
   group_by(Num) %>%
   summarise(Correct = toString(na.omit(Correct))) %>%
   left_join(data)

#   Num Correct     Fruit                
#  <dbl> <chr>       <fct>                
#1     1 good        Apples               
#2     2 good, gross apples, maybe bananas
#3     3 ok          Oranges              
#4     4 ok, good    grapes w apples      
#5     5 sour        Lemons               

For the updated data, we can remove the extra words which occur and do

data %>%
  mutate(Fruit = gsub("maybe|w", "", Fruit)) %>%
  tidyr::separate_rows(Fruit, sep = ",\\s+|\\s+") %>%
  mutate(Correct = case_when(
     str_detect(Fruit, regex("apples", ignore_case=TRUE)) ~ "good",
     str_detect(Fruit, regex("bananas", ignore_case=TRUE)) ~ "gross",
     str_detect(Fruit, regex("grapes|oranges", ignore_case=TRUE)) ~ "ok",
     str_detect(Fruit, regex("lemon", ignore_case=TRUE)) ~ "sour",
     TRUE ~ "other")) %>% 
  group_by(Num) %>%
  summarise(Correct = toString(na.omit(Correct))) %>%
  left_join(data)

#    Num Correct     Fruit                
#  <dbl> <chr>       <fct>                
#1     1 good        Apples               
#2     2 good, gross apples, maybe bananas
#3     3 ok          Oranges              
#4     4 ok, good    grapes w apples      
#5     5 other       pears                
Sign up to request clarification or add additional context in comments.

4 Comments

The only issue is TRUE ~ NA_character_ . I want meaningful non-matching strings to be coded as TRUE ~ other. I edited the data to better reflect my actual data. @RonakShah
@W148SMH As mentioned in my post the problem arises because there is no clear separator between each fruits. Sometimes they are separated by comma , sometimes by space. So I have separated by both but there are some non-matching words already like maybe, w. If we give TRUE ~ 'other' then those words would also be assigned 'other'.
If I remove maybe and w in the beginning with something like str_replace(Fruit,"maybe|w","")) it still wants to add other after those words are removed @RonakShah
@W148SMH yes, if those are the only words occurring then you can remove them. See updated answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.