Partial string match in another dataframe in r

Question

Is there a way where I can find all the partial matches from df_2 to df_1?

partial match (if part of DF_1 string is in the whole string of DF_2) For example, part of "for solution" is in the whole string of "solution"

df_1=data.frame(
  DF_1=c("suspension","tablet","for solution","capsule")
)

df_2=data.frame(
  index=c("1","2","3","4","5"),
  DF_2=c("for suspension", "suspension", "solution", "tablet,ER","tablet,IR")
)

df_out=data.frame(
  DF_1=c("suspension","suspension","tablet","tablet","for solution"),
  DF_2=c("for suspension", "suspension","tablet,ER","tablet,IR","solution"),
  index=c("1","2","4","5","3")
)

How would you define a partial match? According to your example, would it be "the string chain in df_a is totally contained in df_b"? — Rhesous
– Rhesous, Commented Aug 4, 2020 at 21:03
Does this answer your question? Test if characters are in a string — mhovd
– mhovd, Commented Aug 4, 2020 at 21:03
@Arault, I defined a partial match above. If part of my string in DF_1 is in DF_2. so for example, part of "for solution" is in "solution", so that's a match. — Ashti
– Ashti, Commented Aug 4, 2020 at 21:06
@Ashti In that case, shouldn't "for solution" merge with "for suspension"? Both have "for" — Rhesous
– Rhesous, Commented Aug 4, 2020 at 21:11
no because part of "for solution" is in "solution" as a whole but not "for suspension" as a whole string. — Ashti
– Ashti, Commented Aug 4, 2020 at 21:13

Rhesous · Accepted Answer · 2020-08-05 08:24:36Z

1

Following @Akrun suggestion of using fuzzyjoin

According to your expected output, you want to join twice, and you want to perform inner_join. Finally you'll match twice if there is a perfect match, which is why you want to deduplicate (I did it with distinct from dplyr but you can do it with what you want.

df_out = distinct(
  rbind(
    regex_inner_join(df_1, df_2, by = c("DF_1"= "DF_2")),
    regex_inner_join(df_2, df_1, by = c("DF_2"= "DF_1"))
  )
)
df_out

The output is:

          DF_1 index           DF_2
1   suspension     2     suspension
2 for solution     3       solution
3   suspension     1 for suspension
4       tablet     4      tablet,ER
5       tablet     5      tablet,IR

You find your expected table, not in the same order though (lines & columns).

answered Aug 5, 2020 at 8:24

Rhesous

1,0047 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mhovd · Accepted Answer · 2020-08-04 21:02:45Z

0

This sounds like a job for grepl()!

E.g. grepl(value, chars, fixed = TRUE) Let me quote an example from a different answer:

> chars <- "test"
> value <- "es"
> grepl(value, chars)
[1] TRUE
> chars <- "test"
> value <- "et"
> grepl(value, chars)
[1] FALSE

answered Aug 4, 2020 at 21:02

mhovd

4,1532 gold badges28 silver badges54 bronze badges

Comments

akrun · Accepted Answer · 2020-08-04 21:03:59Z

0

We can use fuzzyjoin

library(fuzzyjoin)
regex_left_join(df_2, df_1, by = c("DF_2"= "DF_1"))

answered Aug 4, 2020 at 21:03

akrun

891k38 gold badges590 silver badges700 bronze badges

1 Comment

Ashti Over a year ago

thanks! this didn't return a match for one of the rows.

ThomasIsCoding · Accepted Answer · 2020-08-05 11:32:05Z

0

Here is a base R option using nested *apply + grepl

df_out <- within(
  df_2,
  DF_1 <- unlist(sapply(
    DF_2,
    function(x) {
      Filter(
        Negate(is.na),
        lapply(
          df_1$DF_1,
          function(y) ifelse(grepl(y, x), y, ifelse(grepl(x, y), x, NA))
        )
      )
    }
  ), use.names = FALSE)
)

such that

> df_out
  index           DF_2       DF_1
1     1 for suspension suspension
2     2     suspension suspension
3     3       solution   solution
4     4      tablet,ER     tablet
5     5      tablet,IR     tablet

answered Aug 5, 2020 at 11:32

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

Partial string match in another dataframe in r

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related