1

I would like to ask your help with an "if statement" inside a function that I am using to aggregate some data in a dataframe. With this function I wanted to check if any of several strings are in other string in one column of my dataframe to return an specific value and the matching string.

This is what I have so far and it does what I need. For example, if "f" and "k" are in my string ("fk"), once I apply my function on this row (find_string("fk")), my function will return "success". Additionally I would also like to have the string that was found in the list, in this case 'f'. Something like "success" + "f"

 def find_string(b):
     if "a"  in b or "c"  in b or "d"  in b or "f"  in b:
         return "success"  ## here I want to get the matching string

Any suggestion?

I am using python 2.7.13 with pandas library.

1
  • You mention a Dataframe, does the answer you are looking for need to pandas-specific? IF so, could you provide a small sample of your dataframe? There are functions/methods in pandas that are much faster than normal loops Commented Nov 2, 2017 at 12:54

3 Answers 3

2

If you're using pandas, use str.extract + np.where, it's much faster.

v = df['yourCol'].str.extract('([acdf])', expand=False)
df['newCol'] = np.where(v.isnull(), '', 'success' + v.astype(str))
Sign up to request clarification or add additional context in comments.

2 Comments

Nice solution :)
@jezrael Thank you sir, appreciate it. :-)
1

You could simply use set intersections. It doesn't require any if or loops and should be very efficient:

>>> set('try to find a substring') & set('acdf')
{'a', 'f', 'd'}
>>> set('no substring') & set('acdf')
set()

If you really want to use pandas, look at @Coldspeed's solution.

1 Comment

This is the "right" way (in plain Python) but there's a step lacking for getting from the set to a "success f" or something similar, and it can be confusing for a beginner - what do I do with more than one? with zero? These questions are swept under the rug in the for implementation.
1
def find_string(b):
   for c in ['a', 'c', 'd', 'f']:
       if c in b:
           return 'success ' + c
   return 'failure'

>>> find_string('fk')
'success f'

2 Comments

This is a pandas question, and a pandas solution is required.
@cᴏʟᴅsᴘᴇᴇᴅ I have fixed the OP code. A pandas solution such as yours is better, but this answer should still help the OP, which seems to be struggling with a plain Python implementation. (BTW the pandas tag was added by me).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.