5

How to search for a string value on each and every columns using pandas . Lets say I have 32 columns ,

df[df['A'].str.contains("hello")]

this returns whether the value is present in "A" column or not ,How to search on every columns and the row in which the value is exist . Dataset :

A           B           C
1           hi          hie
2           bye         Hello

If I search for "hello" or "Hello" output should be :

A           B            C
2           bye         Hello
1
  • s = df.stack() # convert entire data frame into a series of values df.iloc[s[s.str.contains('hello',na=False)].index.get_level_values(0)] # see below post for details on how it works Commented Sep 19, 2018 at 11:53

3 Answers 3

8

I think you can use:

df = pd.DataFrame({'A':['hello fgf','s','f'],'B':['d','ff hello','f'],'C':[4,7,8]})
print (df)
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7
2          f         f  8

mask = df.applymap(lambda x: 'hello' in str(x))
print (mask)
       A      B      C
0   True  False  False
1  False   True  False
2  False  False  False

Then if need filter add any for check at least one True per row with boolean indexing:

df1 = df[mask.any(axis=1)]
print (df1)
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7

EDIT:

tested = 'hello'
mask = df.applymap(lambda x:  tested.lower() in str(x).lower())
print (mask)
       A      B      C
0  False  False  False
1  False  False   True
Sign up to request clarification or add additional context in comments.

5 Comments

"TypeError: ("argument of type 'float' is not iterable", u'occurred at index Account_Name') [Finished in 0.3s with exit code 1]"----> getting this error .
It is case sensitive .How to resolve case sensitive part ?
Do you think not key sensitive? Can you give my sample and add it to question?
maybe need mask = df.applymap(lambda x: 'hello' in str(x).lower()) ?
Glad can help, nice day!
2

You can also concatenate all columns into one string and search for your substring in the concatenated string:

In [21]: df[df.astype(str).add('|').sum(1).str.contains('hello')]
Out[21]:
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7

Explanation:

In [22]: df.astype(str).add('|').sum(1)
Out[22]:
0    hello fgf|d|4|
1     s|ff hello|7|
2            f|f|8|
dtype: object

Comments

0

A way to check for existence of partial string, if you need entire string it will be faster than this:

df.stack().str.contains('search_string').sum()>0

However, to answer your question. Convert the DataFrame into Series by stacking all columns. Then search the Series for the string of interest. From the resulting matches, get the row index that is the same as it was in the DataFrame. Using the row index look up the rows from the DataFrame. I think this method is at least 20% faster than applymap method.

s = df.stack() # convert entire data frame into a series of values df.iloc[s[s.str.contains('search_string',na=False)].index.get_level_values(0)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.