Check if one of strings in once column found in the other column

Question

Im trying to check if one of strings in column Name B is found in column Name A by creating new column Name Check:

Current Inputs:

df = pd.DataFrame({"Name A":{"0":"John","1":"Sara","2":"Adam","3":"Ahmed"},
                   "Name B":{"0":"John, Geroge","1":"Ahemed, Sara","2":"Adam, Nadia","3":"Sara, John"},
                   "Salary":{"0":100,"1":200,"2":300,"3":400}})

    Name A  Name B        Salary
0   John    John, Geroge  100
1   Sara    Ahemed, Sara  200
2   Adam    Adam, Nadia   300
3   Ahmed   Sara, John    400

Excepted Output :

    Name A  Name B        Salary  Name Check
0   John    John, Geroge  100     True
1   Sara    Ahemed, Sara  200     True
2   Adam    Adam, Nadia   300     True
3   Ahmed   Sara, John    400     False
4   Nadi    Sara, Nadia   500     True
5   George  Georg, Mo     600     True

What i have tried :

df['Name Check'] = df.apply(lambda x: x['Name B'] in x['Name A'] , axis=1)

But the output is all False, not sure how to convert column Name B to a list and loop through to check one by one if found in column Name A.

jezrael · Accepted Answer · 2021-10-06 13:33:28Z

1

If possible split by , with optionaly space use Series.str.split with DataFrame.isin and DataFrame.any:

df['Name Check'] = (df['Name B'].str.split(',\s*', expand=True)
                                .isin(df['Name A']).any(axis=1))

For test splitted substrings use:

f = lambda x: any(y in x['Name A'] or x['Name A'] in y for y in x['Name B'].split(', '))
df['Name Check1'] = df.apply(f, axis=1)

edited Oct 6, 2021 at 13:33

answered Oct 6, 2021 at 13:06

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

mozway Over a year ago

Note for the second approach, this will match on the whole string (i.e., 'Anna' would match 'Annalisa')

Ibrahim Ayoup Over a year ago

That worked perfectly!, but how can i match on substring ?

jezrael Over a year ago

@IbraheemAyoup - Do you think df['Name Check'] = df.apply(lambda x: x['Name A'] in x['Name B'] , axis=1) ?

Ibrahim Ayoup Over a year ago

Tried this but didnot work, now sure how to loop through the strings in column Name B

mozway Over a year ago

then you should explain what you mean by partial matching (give examples)

|

mozway · Accepted Answer · 2021-10-06 13:14:42Z

1

Here is an approach using a regex with word boundaries:

import re
df.apply(lambda r: bool(re.search(r'\b%s\b' % r['Name A'], r['Name B'])), axis=1)

Explanation: this defines a regex per row of the form \bJohn\b, which ensures a full match is done

answered Oct 6, 2021 at 13:14

mozway

267k13 gold badges56 silver badges106 bronze badges

1 Comment

Ibrahim Ayoup Over a year ago

Just added two examples (in row 4 & 5 "Expcted output") to match on substrings.

Collectives™ on Stack Overflow

Check if one of strings in once column found in the other column

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related