0

I'm trying to create a regex to test if a url is valid or not. I had a good example to work off of, but I had to tweak it a bit to make it fit my purpose:

^(https?:\/\/)(www\.)?(\w*\.)+([\w\-_~:/?#[\]@!$&'()*+,;=.])*$

It works fine for the most part, but it matches the following, which drives me nuts:

http://www..example..com

I tried forever and I just can't get the magical combination of characters to get it to ignore the above use case. What am I doing wrong?

Here's a list of things I want the regex to match (all of them are matched):

http://www.example.com
https://www.example.com
https://www.example.com/
https://example.com/
https://blog.example.com/
https://my.blog.example.com/
https://my.blog.example.co.uk/
https://www.example.com/#test
https://www.example.com#test
https://www.example.com/test.php
https://www.example.com/test.php?test=yes&testmore=yesevenmore
https://www.example.com/test.php#test
https://www.example.com/test.php?test=yes&testmore2=yesevenmore&whatnumber=42#test
https://www.example.com/test
https://www.example.com/test/
https://www.example.com/test/?test=yes&testmore2=yesevenmore&whatnumber=42
https://www.example.com/test/#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.my.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://my.blog.example.co.uk/?test=yes&testmore=yesevenmore&whatnumber=42#test
http://255.255.255.255
http://www.example.com:8008
http://www.example.com:8008/test/?test=yes&testmore=yesevenmore&whatnumber=42#test

Here's a list of things I DON'T want it to match:

www.example.com
example.com
*http://www.blog..example..com
*http://www..example.com
*http://www...example.com
*http://www..example..com
http://www.example.com | not valid
http://www.example.com|
255.255.255.255

* still matched

How can I prevent regex from matching the multidots?

6
  • 1
    Do you mean like this? ^https?:\/\/(?:[-\w~:/?#[\]@!$&'()*+,;=]+\.)*[-\w~:/?#[\]@!$&'()*+,;=]+$ regex101.com/r/dF8zpI/1 Commented Sep 13, 2019 at 10:12
  • Just try to connect to the site and check the return status. Commented Sep 13, 2019 at 10:14
  • @Thefourthbird yes, thank you!!! That works for my use cases... Commented Sep 13, 2019 at 10:18
  • I think ^https?:\/\/(?:[^\s.|]+\.)*[^\s.|]+$ is what you need: only allow non-consecutive dots, no | and spaces in the URLs. See regex101.com/r/74BCXB/1 Commented Sep 13, 2019 at 10:31
  • 1
    Try ^https?:\/\/(?:www\.)?(?:[\w-]+\.)+[\w-]+(?:[:/#][-\w~:/?#[\]@!$&'()*+,;=.]*)?$, see regex101.com/r/HCB0Qt/1 Commented Sep 13, 2019 at 12:18

1 Answer 1

1

Your pattern matches the dot literally \. as well as in the character class which is repeated 1+ times as a group and (\w*\.)+ also matches consecutive dots.

You could shorten the character class as some parts do not have to be escaped and \w also matches _

Using the characters from your character class that you accept to be valid you could repeat in a group matching what you want to allow excluding the dot and match a single dot at the end:

^https?:\/\/(?:[-\w~:/?#[\]@!$&'()*+,;=]+\.)*[-\w~:/?#[\]@!$&'()*+,;=]+$

That will match

  • ^ Start of string
  • https?:\/\/ Match http:// or https://
  • (?: Non capturing group
    • [-\w~:/?#[\]@!$&'()*+,;=]+\. Match 1+ times any of listed, then match a .
  • )* Close group and repeat 0+ times
  • [-\w~:/?#[\]@!$&'()*+,;=]+ Match any of the listed 1+ times (note that there is no .)
  • $ End of string

Regex demo

A more specific variant:

^https?:\/\/\w+(?:\.\w+)*(?:[/#:][-\w~:/?#[\]@!$&'()*+,;=.]*)?$

Regex demo

Sign up to request clarification or add additional context in comments.

5 Comments

I think it's a good solution, but there is one flaw: you don't want to use the same character set for the domain part as for the rest of the URL, otherwise garbage like http://ww?w.example.com will be matched. I tried ^https?:\/\/(\w+.)*[-\w~:/?#[\]@!$&'()*+,;=]+$ and ^https?:\/\/([a-zA-Z0-9]+.)*[-\w~:/?#[\]@!$&'()*+,;=]+$, but they still match the ? for some reason...
@SynnKo You did not escape the dot.
@SynnKo A more specific pattern could be ^https?:\/\/\w+(?:\.\w+)*(?:[/#:][-\w~:/?#[\]@!$&'()*+,;=.]*)?$ regex101.com/r/jDpE9P/1
@Thefourthbird Right, forgot the dot... wouldn't have worked anyway, some of the previously matched test cases were no longer being matched. Your solution works like a charm, though! Many thanks.
@Thefourthbird Small correction: the first * needs to be changed to a +, otherwise it will match stuff like www.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.