I have two dataframes, both have a Last_Name column. First dataframe has a column Contains_First_Name and the second has a column called First_Name. I want to join the two on the exact spelling of Last_Name and a substring match of Contains_First_Name and First_Name (where First_Name is a substring of Contains_First_Name.) Please see an example below.
library(dplyr)
library(stringr)
# Create df1
Last_Name <- c("Smith", "Jones", "Adams", "Rogers", "Lee", "Lee", "Lee")
Contains_First_Name <- c("Kimberly Nicole", "Patrick L", "Johnson Ann", "Rick", "McAdams Jennifer Marie", "Kirk", "Kirk B")
Account_Number <- c("123", "345", "678", "901", "234", "567", "890")
df1 <- data.frame(Last_Name, Contains_First_Name, Account_Number)
# Create df2
Last_Name <- c("Smith", "Jones", "Adams", "Lee", "Lee")
First_Name <- c("Kimberly", "Patrick", "Ann", "Jennifer", "Kirk")
df2 <- data.frame(Last_Name, First_Name)
Resulting dataframes:
> df1
Last_Name Contains_First_Name Account_Number
1 Smith Kimberly Nicole 123
2 Jones Patrick L 345
3 Adams Johnson Ann 678
4 Rogers Rick 901
5 Lee McAdams Jennifer Marie 234
6 Lee Kirk 567
7 Lee Kirk B 890
> df2
Last_Name First_Name
1 Smith Kimberly
2 Jones Patrick
3 Adams Ann
4 Lee Jennifer
5 Lee Kirk
What I want to end up with:
> df3
Last_Name Contains_First_Name Account_Number First_Name
1 Smith Kimberly Nicole 123 Kimberly
2 Jones Patrick L 345 Patrick
3 Adams Johnson Ann 678 Ann
4 Lee McAdams Jennifer Marie 234 Jennifer
5 Lee Kirk 567 Kirk
6 Lee Kirk B 890 Kirk
I tried this:
df3 <-
filter(df1,
Last_Name %in% df2$Last_Name,
str_detect(Contains_First_Name, paste(df2$First_Name, collapse = "|")))
Getting the following error:
Error in match.arg(method) : 'arg' must be NULL or a character vector
I also explored fuzzyjoin library but could not figure out how to join on two variables with two different join types (exact and substring.) I saw a similar question which does not appear to have an answer: Merge two dataframes based on an exact match in one column and match within an error in another column in R.
Any advice is greatly appreciated. Thank you.