2

I need to find with the regular expression domain names that don't start with the string "http". For example:

I found a regex that almost got this:

(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}

But it also detects "https://domain1.com"

Example given:

https://regex101.com/r/DjDBrx/1/

In this example I want to avoid "https://domain1.com"

Any help would be gratefully appreciated.

7
  • 1
    Are you validating full strings? Or extracting from longer texts? Commented Aug 9, 2021 at 16:55
  • 1
    Welcome to SO! Do you want /^(?!http)/ or /\b(?!http)/? I don't understand the {,61} and {2,6}, and your pattern has no http in it, so it seems to have nothing to do with your written specification. Commented Aug 9, 2021 at 16:56
  • @WiktorStribiżew I'm extracting from longer texts Commented Aug 9, 2021 at 17:10
  • @ggorlen Thank you :) {,61} and {2,6} are for detecting domains. It has no "http" on it because I don't know how to add it in order to ignore "http" at the start. Commented Aug 9, 2021 at 17:13
  • 1
    I would use something like \b(?<!https:\/\/)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b, a lookbehind with word boundaries. Commented Aug 9, 2021 at 17:13

2 Answers 2

1

You can use a word boundary coupled with two negative lookbehinds:

\b(?<!http:\/\/)(?<!https:\/\/)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                ^^

The (?<!http:\/\/)(?<!https:\/\/) are two negative lookbehinds that will get triggered at the same location inside the string (since lookarounds are non-consuming patterns) and - after making sure the location is at the word boundary due to \b - they will fail the match if there is http:// or https:// immediately to the left of the current location.

Sign up to request clarification or add additional context in comments.

Comments

0

You can use negative lookahead, I think it is usually the quickest option. It returns negative if contains the string you are excluding, like: ^(?!(http)).*

2 Comments

How would be the full regex? Don't know where to add it.
@Daniel The entire regex would be "^(?!(http:))". It will return everything that doesn't contain "http:"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.