1

Im trying to check if one of strings in column Name B is found in column Name A by creating new column Name Check:

Current Inputs:

df = pd.DataFrame({"Name A":{"0":"John","1":"Sara","2":"Adam","3":"Ahmed"},
                   "Name B":{"0":"John, Geroge","1":"Ahemed, Sara","2":"Adam, Nadia","3":"Sara, John"},
                   "Salary":{"0":100,"1":200,"2":300,"3":400}})

    Name A  Name B        Salary
0   John    John, Geroge  100
1   Sara    Ahemed, Sara  200
2   Adam    Adam, Nadia   300
3   Ahmed   Sara, John    400

Excepted Output :

    Name A  Name B        Salary  Name Check
0   John    John, Geroge  100     True
1   Sara    Ahemed, Sara  200     True
2   Adam    Adam, Nadia   300     True
3   Ahmed   Sara, John    400     False
4   Nadi    Sara, Nadia   500     True
5   George  Georg, Mo     600     True

What i have tried :

df['Name Check'] = df.apply(lambda x: x['Name B'] in x['Name A'] , axis=1)

But the output is all False, not sure how to convert column Name B to a list and loop through to check one by one if found in column Name A.

2 Answers 2

1

If possible split by , with optionaly space use Series.str.split with DataFrame.isin and DataFrame.any:

df['Name Check'] = (df['Name B'].str.split(',\s*', expand=True)
                                .isin(df['Name A']).any(axis=1))

For test splitted substrings use:

f = lambda x: any(y in x['Name A'] or x['Name A'] in y for y in x['Name B'].split(', '))
df['Name Check1'] = df.apply(f, axis=1)
Sign up to request clarification or add additional context in comments.

8 Comments

Note for the second approach, this will match on the whole string (i.e., 'Anna' would match 'Annalisa')
That worked perfectly!, but how can i match on substring ?
@IbraheemAyoup - Do you think df['Name Check'] = df.apply(lambda x: x['Name A'] in x['Name B'] , axis=1) ?
Tried this but didnot work, now sure how to loop through the strings in column Name B
then you should explain what you mean by partial matching (give examples)
|
1

Here is an approach using a regex with word boundaries:

import re
df.apply(lambda r: bool(re.search(r'\b%s\b' % r['Name A'], r['Name B'])), axis=1)

Explanation: this defines a regex per row of the form \bJohn\b, which ensures a full match is done

1 Comment

Just added two examples (in row 4 & 5 "Expcted output") to match on substrings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.