0

I'm trying to match some expression with regex but it's not working. I want to match a string not starting with http://www.domain.com. Here is my regex :

^https?:\/\/(www\.)?(?!domain\.com)

Is there a problem with my regex?

I want to match expression starting with http:// but different from http://site.com For example:

/page.html => false
http://www.google.fr => true
http://site.com => false
http://site.com/page.html => false
2
  • 2
    ^ outside a character class means "start of line", not "not". Commented Mar 27, 2013 at 15:47
  • Can you post an example of what you expect to/not to match but doesn't/does? The regex looks reasonable. Also there's no need to escape /. Commented Mar 27, 2013 at 15:49

3 Answers 3

7

Use this to match a URL that does not have the domain you mention: https?://(?!(www\.domain\.com\/?)).*

Example in action: http://regexr.com?34a7p

Sign up to request clarification or add additional context in comments.

Comments

1

The problem here is that when the regex engine encounters the successful match on the negative look-ahead it will treat the match as a failure (as expected) and backtrack to the previous group (www\.) quantified as optional and then see if the expression is successful without it. This is what you have over looked.

This could be fixed with the application of atomic grouping or possessive quantifiers to 'forget' the possibility of backtracking. Unfortunately python regex doesn't support this natively. Instead you'll have to use a much less efficient method: using a larger look-ahead.

^https?:\/\/(?!(www\.)?(domain\.com))

2 Comments

The OP still needs to match lines starting with http:// or https://, just not with the domain name.
Good point, while it shouldn't have an effect on the overall results of the expression, it could potentially make it much less efficient. I have changed the answer to reflect this.
0

You want a negative look-ahead assertion:

^https?://(?!(?:www\.)?site\.com).+

Which gives:

>>> testdata = '''\
... /page.html => false
... http://www.google.fr => true
... http://site.com => false
... http://site.com/page.html => false
... '''.splitlines()
>>> not_site_com = re.compile(r'^https?://(?!(?:www\.)?site\.com).+')
>>> for line in testdata:
...     match = not_site_com.search(line)
...     if match: print match.group()
... 
http://www.google.fr => true

Note that the pattern excludes both www.site.com and site.com:

>>> not_site_com.search('https://www.site.com')
>>> not_site_com.search('https://site.com')
>>> not_site_com.search('https://site-different.com')
<_sre.SRE_Match object at 0x10a548510>

1 Comment

@guillaume: right, then still you need a negative look-ahead assertion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.