2

I've seen may questions on this topic but most are the opposite of mine. I have a list of strings (column of a data frame) and a list of sub strings. I want to compare each string to the list of sub strings If it contains a sub string then return that sub-string else print 'no match'.

    subs = [cat, dog, mouse]

    df

      Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           cat
     dogfood      2           dog
     mousehouse   1           mouse
     birdseed     1           no match

my current output looks like this though:

     Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           dog
     dogfood      2           dog
     mousehouse   1           dog
     birdseed     1           dog

I suspect my code is just returning the first thing in the series, how do I change that to the correct thing in the series? Here is the Function:

    def matchy(col, subs):
        for name in col:
            for s in subs:
                if any(s in name for s in subs):
                    return s
                else:
                    return 'No Match'
1
  • You don't need the any(s in name for s in subs) loop in line 4 as you are already looping over the list of subs in line 3. Commented Nov 15, 2017 at 18:16

4 Answers 4

5

The pandaic way to solve this would be to not use loops at all. You could do this pretty simply with str.extract:

p = '({})'.format('|'.join(subs))
df['SubMatch'] = df.Name.str.extract(p, expand=False).fillna('no match')

df

         Name  Number  SubMatch
0     dogfood       1       dog
1     catfood       3       cat
2     dogfood       2       dog
3  mousehouse       1     mouse
4    birdseed       1  no match
Sign up to request clarification or add additional context in comments.

2 Comments

@Vaishali p is a regex pattern ;-) ('(cat|dog|mouse)')
This was the only answer that worked out of the top few. Thanks!
1

How about this:

def matchy(col, subs):
    for name in col:
        try:
            return next(x for x in subs if x in name)
        except StopIteration:
            return 'No Match'

The problem with your code was that you were checking for matches with any but returning the first item of the iteration first (dog).


EDIT kudos @Coldspeed

def matchy(col, subs):
    for name in col:
        return next(x for x in subs if x in name, 'No match')

3 Comments

next has a default argument which is returned if nothing else is. You can get rid of the try-except and reduce this to a single line - next( (x for x in subs if x in name), 'No match')
@cᴏʟᴅsᴘᴇᴇᴅ Thanks a lot for the hint! Didn't know that.
@cᴏʟᴅsᴘᴇᴇᴅ Shouldn't one go with yield here?
0

I think you are over complicating things with a nested loop then the any test inside. Would this work better:

def matchy(col, subs):
        for name in col:
            for s in subs:
                if s in name:
                    return s
                else:
                    return 'No Match'

Comments

0

Unless there is code missing that accounts for it, it would appear that your code returns the result for the very first comparison, and actually does not look at any of the other items in the col list. If you would rather stick with nested loops, I would suggest modifying your code like so:

def matchy(col, subs):
    subMatch = []
    for name in col:
        subMatch.append('No Match')
        for s in subs:
            if s in name:
                subMatch[-1] = s
                break
    return subMatch

This assumes that col is a list of strings containing the column information (dogfood, mousehouse, etc) and that subs is a list of strings containing the substrings you wish to search for. subMatch is a list of strings returned by matchy that contains the search results for each item in col.

For each value in col we examine, we append the 'No Match' string to subMatch, basically assuming we did not find a match. Then we iterate through subs, checking to see if the substring s is contained within name. If there is a match, then subMatch[-1] = s replaces the most recent 'No Match' we appended with the matching substring, then we break to move onto the next item in col since we don't need to search for any more values. Note that subMatch[-1] = s can be replaced with other methods, such as doing subMatch.pop() followed by subMatch.append(s), though at that point I think it is more personal preference. Once all elements in col have been checked, subMatch is returned, at which point you can then process it however you like.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.