I have a dataframe and a set of keywords. I want to create a new column in the dataframe that matches any of the strings in the keywords and a second dataframe with not-matching strings.
keyword <- c('yellow','blue','red','green','purple')
my dataframe
| colour | id |
|---|---|
| blue | A234 |
| blue,black | A5 |
| yellow | A6 |
| blue,green,purple | A7 |
What i hope to get is a dataframe like this:
| colour | id | match | non-match |
|---|---|---|---|
| blue | A234 | blue | yellow,red,green,purple |
| blue,green | A5 | blue,green | yellow,red,purple |
| yellow | A6 | yellow | blue,red,green,purple |
| blue,green,purple | A7 | blue,green,purple | yellow,red |
I tried this to get the match column:
df %>% mutate(match = str_extract(paste(keyword,collapse="|"), tolower(colour)))
but it only worked for the first and third rows, not the 2nd and 4th rows. Appreciate any help with this and also to get a column of unmatched strings.