1

I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs.

$("#links").change(function() {

    //var matches = new array();
    var linksStr = $("#links").val();
    var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
    var matches = linksStr.match(pattern);

    for(var i = 0; i < matches.length; i++) {
      alert(matches[i]);
    }

})

It doesn't capture this url (I need it to):

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar

But it captures this

http://www.wupload.com

4
  • 1
    it does capture that one :P alert("http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar".match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/g)) Commented Aug 8, 2011 at 16:45
  • Now firebug says 'Regular expression too complex' :( Commented Aug 8, 2011 at 17:28
  • that's odd... idk. i'm using Chrome and it worked in Chrome's js console. Commented Aug 8, 2011 at 17:41
  • @Joseph try it with multiple urls? Commented Aug 8, 2011 at 17:56

3 Answers 3

1

Several things:

  1. The main reason it didn't work, is when passing strings to RegExp(), you need to slashify the slashes. So this:

    "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"
    

    Should be:

    "^(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
    


  2. Next, you said that FF reported, "Regular expression too complex". This suggests that linksStr is several lines of URL candidates.
    Therefore, you also need to pass the m flag to RegExp().

  3. The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". So, also use the i flag with RegExp().

  4. Whitespace always creeps in, especially in multiline values. Use a leading \s* and $.trim() to deal with it.

  5. Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar are not allowed?

Putting it all together (except for item 5), it becomes:

var linksStr    = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar  \n"
                + "  http://XXXupload.co.uk/fun.exe \n "
                + " WWW.Yupload.mil ";
var pattern     = new RegExp (
                    "^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
                    , "img"
                );

var matches     = linksStr.match(pattern);
for (var J = 0, L = matches.length;  J < L;  J++) {
    console.log ( $.trim (matches[J]) );
}

Which yields:

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
http://XXXupload.co.uk/fun.exe
WWW.Yupload.mil
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I've used some of uber leet Jeff Atwood's regex - \bhttp://[^\s]+ for now
0

Why not do make: URLS = str.match(/https?:[^\s]+/ig);

1 Comment

If the URL is in a user-supplied text-box and is followed by a comma or full stop (as users often do!), those punctuation marks will be treated as part of the URL
0
(https?\:\/\/)([a-z\/\.0-9A-Z_-\%\&\=]*)

this will locate any url in text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.