0

I'm trying to match two dataframes by the strings they have in common. This is the code for example:

import pandas as pd
import numpy as np
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
       'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
       'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
       'DF-PI-34', 'DF-PI-84']

B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
       'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
       'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
       'PI-84-CU-S2']


A = pd.DataFrame(A,columns=['ID'])
B = pd.DataFrame(B,columns=['Name'])
B['param_2'] = np.linspace(20,300,21)
A['param_1'] = np.linspace(0,20,17)

I would like to have ONE pandas dataframe with the shape (21,3) where the columns are ['Name','param_1',param_2]. where if the column 'ID' from pandas dataframe B matches the column 'Name' from pandas dataframe A, I get the values from the other columns in A and B, if the 'Name' doesn't match, I have NaN.

I have tried the code below but doesn't work.

B['param_1'] = np.nan
for date_B, row_B in B.iterrows():
    
    for date_A, row_A in A.iterrows():
        if row_B['Name'] in row_A['ID']:
            row_B['param_1'] = row_A['param_1']


            break

I also would prefer some solution without for loops, because I have a large dataset, perhaps with pandas.isin() function, I have tried but didn't manage to get the right result. Something like in this example:

Join dataframes based on partial string-match between columns

1 Answer 1

1

Try left join:

A["_ID"] = A.ID.str[3:]
pd.merge(B,A, how="left", left_on="Name", right_on="_ID")[['Name','param_1', 'param_2']]
Sign up to request clarification or add additional context in comments.

5 Comments

This code doesn't work, it gives NaN in column 'param_1' instead of picking the value from dataframe A
this is because your data in A and B do not match at all. e.g. in A it is DF-PI-05 and in B it is PI-05
That's the question, because of that I was trying a for loop with 'in'. If a subset of the string in the Name/ID match, I need that value
@JenifferBarreto updated, try this now.
Thank you. That works in this specific case when the matching is the end of the ID. If the word to match is somewhere in the middle, will not work. I was wondering if would be possible to do it with the pandas.isin function

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.