Merge if two string columns are substring of one column from another dataframe in Python

Question

Given two dataframes as follow:

df1:

   id                                      address  price
0   1         8563 Parker Ave. Lexington, NC 27292      3
1   2         242 Bellevue Lane Appleton, WI 54911      3
2   3       771 Greenview Rd. Greenfield, IN 46140      5
3   4       93 Hawthorne Street Lakeland, FL 33801      6
4   5  8952 Green Hill Street Gettysburg, PA 17325      3
5   6    7331 S. Sherwood Dr. New Castle, PA 16101      4

df2:

  state            street  quantity
0    PA       S. Sherwood        12
1    IN  Hawthorne Street         3
2    NC       Parker Ave.         7

Let's say if both state and street from df2 are contained in address from df2, then merge df2 to df1.

How could I do that in Pandas? Thanks.

The expected result df:

   id                                      address  ...       street quantity
0   1         8563 Parker Ave. Lexington, NC 27292  ...  Parker Ave.     7.00
1   2         242 Bellevue Lane Appleton, WI 54911  ...          NaN      NaN
2   3       771 Greenview Rd. Greenfield, IN 46140  ...          NaN      NaN
3   4       93 Hawthorne Street Lakeland, FL 33801  ...          NaN      NaN
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...          NaN      NaN
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...  S. Sherwood    12.00

[6 rows x 6 columns]

My testing code:

df2['addr'] = df2['state'].astype(str) + df2['street'].astype(str)

pat = '|'.join(r'\b{}\b'.format(x) for x in df2['addr'])
df1['addr']= df1['address'].str.extract('\('+ pat + ')', expand=False)

df = df1.merge(df2, on='addr', how='left')

Output:

   id                                      address  ...  street_y quantity_y
0   1         8563 Parker Ave. Lexington, NC 27292  ...       NaN        nan
1   2         242 Bellevue Lane Appleton, WI 54911  ...       NaN        nan
2   3       771 Greenview Rd. Greenfield, IN 46140  ...       NaN        nan
3   4       93 Hawthorne Street Lakeland, FL 33801  ...       NaN        nan
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...       NaN        nan
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...       NaN        nan

[6 rows x 10 columns]

Nk03 · Accepted Answer · 2021-05-06 11:03:10Z

1

TRY:

pat_state = f"({'|'.join(df2['state'])})"
pat_street = f"({'|'.join(df2['street'])})"
df1['street'] = df1['address'].str.extract(pat=pat_street) 
df1['state'] = df1['address'].str.extract(pat=pat_state) 
df1.loc[df1['state'].isna(),'street'] = np.NAN
df1.loc[df1['street'].isna(),'state'] = np.NAN
df1 = df1.merge(df2, left_on=['state','street'], right_on=['state','street'], how ='left')

answered May 6, 2021 at 11:03

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ah bon Over a year ago

Thanks, I'll test with my real data, and let you know.

ah bon Over a year ago

Sorry, it raises an error: error: missing ), unterminated subpattern

ah bon Over a year ago

It works after removing punctuations with df2["street"] = df2['street'].str.replace('[^\w\s]','')

ah bon Over a year ago

If I need to merge based on 3 columns?

wwnde · Accepted Answer · 2021-05-06 10:35:06Z

1

k="|".join(df2['street'].to_list())
df1=df1.assign(temp=df1['address'].str.findall(k).str.join(', '), temp1=df1['address'].str.split(",").str[-1])
dfnew=pd.merge(df1,df2, how='left', left_on=['temp','temp1'], right_on=['street',"state"])

edited May 6, 2021 at 10:35

answered May 6, 2021 at 10:09

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

2 Comments

ah bon Over a year ago

Thanks, but you didn't use df2['state']?

ah bon Over a year ago

Thanks, if address has no , to split, how could we modify your code?

Collectives™ on Stack Overflow

Merge if two string columns are substring of one column from another dataframe in Python

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related