Regular expression for detecting hyperlinks

Question

I've got this regex pattern from WMD showdown.js file.

/<((https?|ftp|dict):[^'">\s]+)>/gi

and the code is:

text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"<a href=\"$1\">$1</a>");

But when I set text to http://www.google.com, it does not anchor it, it returns the original text value as is (http://www.google.com).

P.S: I've tested it with RegexPal and it does not match.

Take the <> out, it should work This one looks to be the best: (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])? From regexlib.com/… — Rob
– Rob, Commented Aug 22, 2011 at 21:06
The last time someone answered a question about regex and HTML it drove them mad. stackoverflow.com/questions/1732348/… — Callie J
– Callie J, Commented Aug 22, 2011 at 21:08
So you just want to take the whole url and put it in an anchor tag? In your example it should return <a href="http://www.google.com">http://www.google.com</a>? — Ali
– Ali, Commented Aug 22, 2011 at 21:12
There are many more protocols than the 3 listed, are those the only ones you want? And you are creating links, not anchors. — RobG
– RobG, Commented Aug 22, 2011 at 23:48

Paul · Accepted Answer · 2011-08-22 21:09:07Z

2

Your code is searching for a url wrapped in <> like: <http://www.google.com>: RegexPal.

Just change it to /((https?|ftp|dict):[^'">\s]+)/gi if you don't want it to search for the <>: RegexPal

answered Aug 22, 2011 at 21:09

Paul

142k28 gold badges285 silver badges272 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ali · Accepted Answer · 2011-08-22 21:34:49Z

0

As long as you know your url's start with http:// or https:// or whatever you can use:

/((https?|s?ftp|dict|www)(://)?)[A-Za-z0-9.\-]+)/gi

The expression will match till it encounters a character not allowed in the URL i.e. is not A-Za-z\.\-. It will not however detect anything of the form google.com or anything that comes after the domain name like parameters or sub directory paths etc. If that is your requirement that you can simply choose to terminate the terminating condition as you have above in your regex.

I know it seems pointless but it may be useful if you want the display name to be something abbreviated rather than the whole url in case of complex urls.

edited Aug 22, 2011 at 21:34

answered Aug 22, 2011 at 21:28

Ali

12.7k10 gold badges57 silver badges83 bronze badges

2 Comments

RobG Over a year ago

There are lots of other characters that are valid in a URL, pretty much anything other than a space is allowed.

Ali Over a year ago

Ignoring internationalized domain names... no, basically only A-Za-z0-9\- are allowed in domain names the - cannot be leading or the last character. LordCover (asker) is from Syria so it's really up to him I guess to decide what works. Either way, this regex is only useful for extracting the domain name which wasn't the requirement to start with. (Look at Valid characters en.wikipedia.org/wiki/Domain_name)

RobG · Accepted Answer · 2011-08-23 00:34:10Z

0

You could use:

var re = /(http|https|ftp|dict)(:\/\/\S+?)(\.?\s|\.?$)/gi;

with:

 el.innerHTML = el.innerHTML.replace(re, '<a href=\'$1$2\'>$1$2<\/a>$3');

to also match URLs at the end of sentences.

But you need to be very careful with this technique, make sure the content of the element is more or less plain text and not complex markup. Regular expressions are not meant for, nor are they good at, processing or parsing HTML.

answered Aug 23, 2011 at 0:34

RobG

148k32 gold badges180 silver badges216 bronze badges

Collectives™ on Stack Overflow

Regular expression for detecting hyperlinks

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related