5

I've a basic URL validation in my appliction. Right now i'm using the following code.

//validates whether the given value is 
//a valid URL
function validateUrl(value)
{
    var regexp = /(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/
    return regexp.test(value);
}

But right now it is not accepting URLs without the protocol. For ex. if i provide www.google.com it is not accepting it. How can i modify the RegEx to make it accept URLs without protocol?

3
  • Thank you for all your replies. Worked great. Commented Aug 3, 2010 at 12:24
  • All of your Regex's are accepting @@##$$ as a valid URL. Any ideas? Commented Aug 4, 2010 at 6:49
  • NLV, you didn't specify you wanted us to correct your regex, you just asked how to change it to accept any protocol. Anyhow, see my new answer below which gives a complete (and complex) URL validation regex. Commented Aug 6, 2010 at 20:00

5 Answers 5

5

Here's a big long regex for matching a URL:

(?i)\b((?:(?:[a-z][\w-]+:)?(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

The expanded version of that (to help make it understandable):

(?xi)
\b
(                           # Capture 1: entire matched URL
  (?:
    (?:[a-z][\w-]+:)?                # URL protocol and colon
    (?:
      /{1,3}                        # 1-3 slashes
      |                             #   or
      [a-z0-9%]                     # Single letter or digit or '%'
                                    # (Trying not to match e.g. "URI::Escape")
    )
    |                           #   or
    www\d{0,3}[.]               # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                           # One or more:
    [^\s()<>]+                      # Run of non-space, non-()<>
    |                               #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                           # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                                   #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

These both come from this page, but modified slightly to make protocol properly optional - you should read that page to help understand what it's doing, and it also has a variant which only matched web-based URLs, which you may want to take a look at too.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your effort. Let me do a check on it.
1

Change the regex to:

/((ftp|http|https):\/\/)?(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/

1 Comment

As with hsz's answer, this moves the ftp/http/https to group 2, and doesn't accept //server URLs.
1

I am not an regex expert, but surrounding the protocol with another bracket and using a question mark at the end should make it optional:

function validateUrl(value)
{
    var regexp = /((ftp|http|https):\/\/)?(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/
    return regexp.test(value);
} 

3 Comments

Again, if this regex was used to capture URL parts, it's creating unnecessary groups, and it's incorrectly combining the // with the protocol, which excludes valid URLs.
Although //google.com works, it is not a valid URL and I don't think that most people knows that it would work and therefore it can be very useful to exclude such URLs from the validation. Not because it is possible it has to be valid in every form. The double slashes are only something in between as the dots are betwenn subdomain, domain or TLD.
The double slashes are the prefix to the path, whilst the colon is the seperator with the protocol - they are two distinct parts that just happen to occur together. (This is detailed in "3. URI Syntactic Components" of RFC 2396) Using //google.com is a valid relative Url (Again, see appendix "C.1 Normal Examples" of RFC 2396) and it does occur "in the wild".
1

Make protocol optional with (...)?

/(((ftp|http|https):\/\/)|(\/\/))?(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/

4 Comments

This moves the ftp/http/https to group 2, and doesn't accept //server URLs.
Look at my edit - now it accepts protocol:// or // or none of them.
Also you can use (?:...) to exclude group from the results.
That's over-complicating things still, and doesn't work with http:google.com either (hence why in my answer I simply used two optional groups). Also the parens wrapping the two sides of the alternation are redundant and just make things messier.
0

Change the first part to:

(?:(ftp|http|https):)?(?:\/\/)?

The (?:...) will group content without using capturing groups (so the actual protocol remains in first group).

Note how the protocol: and // parts are individually optional - since //www.google.com is a valid (relative) URL.

3 Comments

Not clear what you're saying there, and that's a long document - can you refer to the specific section you're referring to? I tried (for example) ://google.com in Chrome and IE and it doesn't work, although it looks like Firefox accepts it.
The schema setion include only the name of the protocol (like 'http', 'ftp') but not the colon. So even your regex doesn't split up all groups correctly. But as NLV only wanted to have a validation regex for valid and common (and not only working) URL, there is not need to use a group around the slashes.
The inner group captures the value of http or ftp or whatever, the outer group (where the colon is) is non-capturing, and is necessary to make the whole thing optional. Similarly, the non-capturing group around the slashes is required to make the whole thing optional (it could use \/{0,2} but that would allow /google.com which might not be desired).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.