Check if Pandas column contains value from another column

Question

if df['col']='a','b','c' and df2['col']='a123','b456','d789' how do I create df2['is_contained']='a','b','no_match' where if values from df['col'] are found within values from df2['col'] the df['col'] value is returned and if no match is found, 'no_match' is returned? Also I don't expect there to be multiple matches, but in the unlikely case there are, I'd want to return a string like 'Multiple Matches'.

What do you mean by "multiple matches"? Do you mean the two 'a's in 'a123a', or do you mean in different rows of df2['col'], e.g. ['a123','b456','a789']? — DSM
– DSM, Commented Feb 2, 2014 at 18:12

hernamesbarbara · Accepted Answer · 2014-02-02 18:50:12Z

8

With this toy data set, we want to add a new column to df2 which will contain no_match for the first three rows, and the last row will contain the value 'd' due to the fact that that row's col value (the letter 'a') appears in df1.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df1 = pd.DataFrame({'col': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col': ['a123','b456','d789', 'a']})

In other words, values from df1 should be used to populate this new column in df2 only when a row's df2['col'] value appears somewhere in df1['col'].

In [2]: df1
Out[2]:
  col
0   a
1   b
2   c
3   d

In [3]: df2
Out[3]:
    col
0  a123
1  b456
2  d789
3     a

If this is the right way to understand your question, then you can do this with pandas isin:

In [4]: df2.col.isin(df1.col)
Out[4]:
0    False
1    False
2    False
3     True
Name: col, dtype: bool

This evaluates to True only when a value in df2.col is also in df1.col.

Then you can use np.where which is more or less the same as ifelse in R if you are familiar with R at all.

In [5]:     np.where(df2.col.isin(df1.col), df1.col, 'NO_MATCH')
Out[5]:
0    NO_MATCH
1    NO_MATCH
2    NO_MATCH
3           d
Name: col, dtype: object

For rows where a df2.col value appears in df1.col, the value from df1.col will be returned for the given row index. In cases where the df2.col value is not a member of df1.col, the default 'NO_MATCH' value will be used.

answered Feb 2, 2014 at 18:50

hernamesbarbara

7,0183 gold badges28 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ChrisArmstrong Over a year ago

I actually want it to match on a partial match. So in your example every value would have a match. I don't think isin handles partial matching.

gustavz Over a year ago

your output does not solve the question. he wanted to have a row wise comparison of two columns.

neves · Accepted Answer · 2020-09-14 19:42:54Z

3

You must first guarantee that the indexes match. To simplify, I'll show as if the columns where in the same dataframe. The trick is to use the apply method in the columns axis:

df = pd.DataFrame({'col1': ['a', 'b', 'c', 'd'],
                   'col2': ['a123','b456','d789', 'a']})
df['contained'] = df.apply(lambda x: x.col1 in x.col2, axis=1)
df
  col1  col2  contained
0    a  a123       True
1    b  b456       True
2    c  d789      False
3    d     a      False

answered Sep 14, 2020 at 19:42

neves

40.3k33 gold badges189 silver badges227 bronze badges

Comments

Andy Hayden · Accepted Answer · 2014-02-02 22:01:01Z

1

In 0.13, you can use str.extract:

In [11]: df1 = pd.DataFrame({'col': ['a', 'b', 'c']})

In [12]: df2 = pd.DataFrame({'col': ['d23','b456','a789']})

In [13]: df2.col.str.extract('(%s)' % '|'.join(df1.col))
Out[13]: 
0    NaN
1      b
2      a
Name: col, dtype: object

answered Feb 2, 2014 at 22:01

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Collectives™ on Stack Overflow

Check if Pandas column contains value from another column

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related