3

I have a regular expression '[\w_-]+' which allows alphanumberic character or underscore.

I have a set of words in a python list which I don't want to allow

listIgnore = ['summary', 'config']

What changes need to be made in the regex?

P.S: I am new to regex

2

2 Answers 2

3
>>> line="This is a line containing a summary of config changes"
>>> listIgnore = ['summary', 'config']
>>> patterns = "|".join(listIgnore)
>>> print re.findall(r'\b(?!(?:' + patterns + r'))[\w_-]+', line)
['This', 'is', 'a', 'line', 'containing', 'a', 'of', 'changes']
Sign up to request clarification or add additional context in comments.

Comments

2

This question intrigued me, so I set about for an answer:

'^(?!summary)(?!config)[\w_-]+$'

Now this only works if you want to match the regex against a complete string:

>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>

So to use your list, just add in more (?!<word here>) for each word after ^ in your regex. These are called lookaheads. Here's some good info.

If you're trying to match within a string (i.e. without the ^ and $) then I'm not sure it's possible. For instance the regex will just pick a subset of the string that doesn't match. Example: ummary for summary.

Obviously the more exclusions you pick the more inefficient it will get. There's probably better ways to do it.

1 Comment

Probably, filtering all found values - like in thefourtheye's answer - will be more effective (re may be a memory-crunching bitch)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.