I have searched high and low and nobody seems to have asked that exact question, so I'm at loss.
I have a data frame with a couple columns. One of this column contains various sentences that don't have a specific format of pattern, which limit how I can extract words from this column because I can't use position. My goal is to search this column and extract species name from the sentences. I need to be able to extract multiple species at once because sometimes the sample has tested + for more than one species and I need that information. My method works fairly well and the output works when there is a single species. The problem is when it identifies more than one species. I would want the output to be something like : sp1,sp2,sp3, but instead I get written: c("sp1","sp2"). I have no clue how I could change that. I tried using toString to no avail. Also, everything is written lower case, so there is no case issue
I tried
df = df %>% mutate(Species=str_extract_all(df$RESULT,"sp1|sp2|sp3|sp4"))
where RESULT is the column that I said contains different sentences. This is what I get as output:
| Result | Species |
|---|---|
| Bla-bla-bla-sp1-bla-sp2 | c("sp1","sp2") |
| Bla-bla-bla-sp3-bla-bla | sp3 |
But I would want:
| Result | Species |
|---|---|
| Bla-bla-bla-sp1-bla-sp2 | sp1, sp2 |
| Bla-bla-bla-sp3-bla-bla | sp3 |
I tried:
df = df %>% mutate(Species=str_extract_all(toString(df$RESULT,"sp1|sp2|sp3|sp4")))
but the output ended up being the same
Thanks in advance for your help! I now this is not the most clear example, but I can't use my real data as it contains sensitive info Also just so you know, I wrote sp1, sp2 just to mimick species name but my real data doesn't have species all starting with the same letter which really limits the method of extraction I can use. For example it's cat,dog,bird, so methods with sp\d+ won't work, because it's not really species1, species2
df %>% mutate(Species = str_extract_all(Result, "sp\\d+")). "sp1|sp2|sp3|sp4" should also work.df %>% rowwise() %>% mutate(Species = toString(unlist(str_extract_all(Result, "sp\\d+"))))