Match list of substrings and strings and return substring if it matches

Question

I've seen may questions on this topic but most are the opposite of mine. I have a list of strings (column of a data frame) and a list of sub strings. I want to compare each string to the list of sub strings If it contains a sub string then return that sub-string else print 'no match'.

    subs = [cat, dog, mouse]

    df

      Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           cat
     dogfood      2           dog
     mousehouse   1           mouse
     birdseed     1           no match

my current output looks like this though:

     Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           dog
     dogfood      2           dog
     mousehouse   1           dog
     birdseed     1           dog

I suspect my code is just returning the first thing in the series, how do I change that to the correct thing in the series? Here is the Function:

    def matchy(col, subs):
        for name in col:
            for s in subs:
                if any(s in name for s in subs):
                    return s
                else:
                    return 'No Match'

You don't need the any(s in name for s in subs) loop in line 4 as you are already looping over the list of subs in line 3. — suripoori
– suripoori, Commented Nov 15, 2017 at 18:16

cs95 · Accepted Answer · 2017-11-15 18:16:07Z

5

The pandaic way to solve this would be to not use loops at all. You could do this pretty simply with str.extract:

p = '({})'.format('|'.join(subs))
df['SubMatch'] = df.Name.str.extract(p, expand=False).fillna('no match')

df

         Name  Number  SubMatch
0     dogfood       1       dog
1     catfood       3       cat
2     dogfood       2       dog
3  mousehouse       1     mouse
4    birdseed       1  no match

answered Nov 15, 2017 at 18:16

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cs95 Over a year ago

@Vaishali p is a regex pattern ;-) ('(cat|dog|mouse)')

EEPBAH Over a year ago

This was the only answer that worked out of the top few. Thanks!

Ma0 · Accepted Answer · 2017-11-16 08:29:56Z

1

How about this:

def matchy(col, subs):
    for name in col:
        try:
            return next(x for x in subs if x in name)
        except StopIteration:
            return 'No Match'

The problem with your code was that you were checking for matches with any but returning the first item of the iteration first (dog).

EDIT kudos @Coldspeed

def matchy(col, subs):
    for name in col:
        return next(x for x in subs if x in name, 'No match')

edited Nov 16, 2017 at 8:29

answered Nov 15, 2017 at 18:17

Ma0

15.2k4 gold badges38 silver badges70 bronze badges

3 Comments

cs95 Over a year ago

next has a default argument which is returned if nothing else is. You can get rid of the try-except and reduce this to a single line - next( (x for x in subs if x in name), 'No match')

Ma0 Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Thanks a lot for the hint! Didn't know that.

Ma0 Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Shouldn't one go with yield here?

Steve · Accepted Answer · 2017-11-15 18:20:32Z

0

I think you are over complicating things with a nested loop then the any test inside. Would this work better:

def matchy(col, subs):
        for name in col:
            for s in subs:
                if s in name:
                    return s
                else:
                    return 'No Match'

answered Nov 15, 2017 at 18:20

Steve

8,5111 gold badge29 silver badges37 bronze badges

Comments

CheeseCoder · Accepted Answer · 2017-11-15 19:36:39Z

Unless there is code missing that accounts for it, it would appear that your code returns the result for the very first comparison, and actually does not look at any of the other items in the col list. If you would rather stick with nested loops, I would suggest modifying your code like so:

def matchy(col, subs):
    subMatch = []
    for name in col:
        subMatch.append('No Match')
        for s in subs:
            if s in name:
                subMatch[-1] = s
                break
    return subMatch

This assumes that col is a list of strings containing the column information (dogfood, mousehouse, etc) and that subs is a list of strings containing the substrings you wish to search for. subMatch is a list of strings returned by matchy that contains the search results for each item in col.

For each value in col we examine, we append the 'No Match' string to subMatch, basically assuming we did not find a match. Then we iterate through subs, checking to see if the substring s is contained within name. If there is a match, then subMatch[-1] = s replaces the most recent 'No Match' we appended with the matching substring, then we break to move onto the next item in col since we don't need to search for any more values. Note that subMatch[-1] = s can be replaced with other methods, such as doing subMatch.pop() followed by subMatch.append(s), though at that point I think it is more personal preference. Once all elements in col have been checked, subMatch is returned, at which point you can then process it however you like.

Collectives™ on Stack Overflow

Match list of substrings and strings and return substring if it matches

4 Answers 4

2 Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related