How do you use a regex in a list comprehension in Python?

Question

I'm trying to locate all index positions of a string in a list of words and I want the values returned as a list. I would like to find the string if it is on its own, or if it is preceded or followed by punctuation, but not if it is a substring of a larger word.

The following code only captures "cow" only and misses both "test;cow" and "cow."

myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == myString]
print indices
>> 5

I have tried changing the code to use a regular expression:

import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)]
print indices

But this gives an error: expected string or buffer

If anyone knows what I'm doing wrong I'd be very happy to hear. I have a feeling it's something to do with the fact I'm trying to use a regular expression in there when it's expecting a string. Is there a solution?

The output I'm looking for should read:

>> [0, 4, 5]

Thanks

Rohit Jain · Accepted Answer · 2013-02-11 19:31:55Z

23

You don't need to assign the result of match back to x. And your match should be on x rather than list.

Also, you need to use re.search instead of re.match, since your the regex pattern '\W*myString\W*' will not match the first element. That's because test; is not matched by \W*. Actually, you only need to test for immediate following and preceding character, and not the complete string.

So, you can rather use word boundaries around the string:

pattern = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(pattern, x)]

edited Feb 11, 2013 at 19:31

answered Feb 11, 2013 at 19:13

Rohit Jain

214k45 gold badges419 silver badges534 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

georg · Accepted Answer · 2013-02-11 21:16:12Z

7

There are a few problems with your code. First, you need to match the expr against the list element (x), not against the whole list (myList). Second, in order to insert a variable in the expression, you have to use + (string concatenation). And finally, use raw literals (r'\W) to properly interpet slashes in the expr:

import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if re.match(r'\W*' + myString + r'\W*', x)]
print indices

If there are chances that myString contains special regexp characters (like a slash or a dot), you'll also need to apply re.escape to it:

regex = r'\W*' + re.escape(myString) + r'\W*'
indices = [i for i, x in enumerate(myList) if re.match(regex, x)]

As pointed out in the comments, the following might be a better option:

regex = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(regex, x)]

edited Feb 11, 2013 at 21:16

answered Feb 11, 2013 at 19:15

georg

216k57 gold badges324 silver badges401 bronze badges

5 Comments

Martijn Pieters Over a year ago

Maybe add re.escape too?

Rohit Jain Over a year ago

This doesn't match the first element, which OP want to match.

eldarerathis Over a year ago

Another issue is that the regex doesn't actually provide the output the OP expects (it doesn't match test;cow, for example). I think re.search(r'\b' + myString + r'\b', x) might work.

Adam Over a year ago

Thanks for this. I ran into trouble with the r'\b*' which was returning the error "nothing to repeat", as noted in the comment above.

georg Over a year ago

@Adam: yeah, my bad, should be \b not \b*.

Collectives™ on Stack Overflow

How do you use a regex in a list comprehension in Python?

2 Answers 2

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related