Regex for word exclusion in python

Question

I have a regular expression '[\w_-]+' which allows alphanumberic character or underscore.

I have a set of words in a python list which I don't want to allow

listIgnore = ['summary', 'config']

What changes need to be made in the regex?

P.S: I am new to regex

possible duplicate stackoverflow.com/questions/406230/…

korylprince
– korylprince

2013-11-07 06:04:47 +00:00
Commented Nov 7, 2013 at 6:04 — korylprince
– korylprince, Commented Nov 7, 2013 at 6:04
Agree that it's a duplicate.

justhalf
– justhalf

2013-11-07 06:20:39 +00:00
Commented Nov 7, 2013 at 6:20 — justhalf
– justhalf, Commented Nov 7, 2013 at 6:20

devnull · Accepted Answer · 2013-11-07 06:25:04Z

3

>>> line="This is a line containing a summary of config changes"
>>> listIgnore = ['summary', 'config']
>>> patterns = "|".join(listIgnore)
>>> print re.findall(r'\b(?!(?:' + patterns + r'))[\w_-]+', line)
['This', 'is', 'a', 'line', 'containing', 'a', 'of', 'changes']

answered Nov 7, 2013 at 6:25

devnull

124k33 gold badges247 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

korylprince · Accepted Answer · 2013-11-07 06:20:13Z

2

This question intrigued me, so I set about for an answer:

'^(?!summary)(?!config)[\w_-]+$'

Now this only works if you want to match the regex against a complete string:

>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>

So to use your list, just add in more (?!<word here>) for each word after ^ in your regex. These are called lookaheads. Here's some good info.

If you're trying to match within a string (i.e. without the ^ and $) then I'm not sure it's possible. For instance the regex will just pick a subset of the string that doesn't match. Example: ummary for summary.

Obviously the more exclusions you pick the more inefficient it will get. There's probably better ways to do it.

answered Nov 7, 2013 at 6:20

korylprince

3,0191 gold badge20 silver badges28 bronze badges

1 Comment

volcano Over a year ago

Probably, filtering all found values - like in thefourtheye's answer - will be more effective (re may be a memory-crunching bitch)

Collectives™ on Stack Overflow

Regex for word exclusion in python

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related