Mach a string with the same string from another data frame

Question

I have this dataframe (DF1)

structure(list(ID = 1:3, Text = c("there was not clostridium", "clostridium difficile positive", "test was OK")), class = "data.frame", row.names = c(NA, -3L)) 

ID TEXT
1  "there was not clostridium"
2  "clostridium difficile positive"
3  "test was OK"

and dataframe (DF2)

structure(list(ID = 1:3, Microorganisms = c("ESCHERICHIA COLI", "CLOSTRIDIUM DIFFICILE", "FUNGI")), class = "data.frame", row.names = c(NA, -3L))

ID Microorganisms
1  ESCHERICHIA COLI
2  CLOSTRIDIUM DIFFICILE
3  FUNGI

And I would like with regex find matches DF1 and DF2 and put them to a new column like this

ID TEXT                                Microorganism
1  "there was not clostridium"         CLOSTRIDIUM DIFFICILE
2  "clostridium difficile positive"    CLOSTRIDIUM DIFFICILE
3  "test was OK"                       no

I have tried something like this

DF1 %>% mutate(Mikroorganism = ifelse(grepl(DF2$Microorganisms, TEXT), str_extract(TEXT, DF2$Microorganisms), "no"))

But it was not the way.

A simple regex is not going to work with your first row: there is no "difficile". Are you looking for a match of any of the words in DF2, not the string as a whole? — r2evans
– r2evans, Commented Feb 2, 2021 at 13:35
Yes, I would like to match of any of the words in DF2. Is it possible? — onhalu
– onhalu, Commented Feb 2, 2021 at 13:36

r2evans · Accepted Answer · 2021-02-02 13:45:32Z

4

One way is using the fuzzyjoin package.

DF1 %>%
  fuzzyjoin::regex_left_join(
    transmute(DF2, Microorganisms, ptn = gsub("\\s+", "|", Microorganisms)),
    by = c("Text" = "ptn"), ignore_case = TRUE) %>%
  select(-ptn)
#   ID                           Text        Microorganisms
# 1  1      there was not clostridium CLOSTRIDIUM DIFFICILE
# 2  2 clostridium difficile positive CLOSTRIDIUM DIFFICILE
# 3  3                    test was OK                  <NA>

answered Feb 2, 2021 at 13:45

r2evans

167k8 gold badges92 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

onhalu Over a year ago

Will it work when in Text column in DF 1 is more than one string? I mean instead of just "there was not clostridium" will be c("there was not clostridium", "some text", "some text)

Collectives™ on Stack Overflow

Mach a string with the same string from another data frame

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related