Regex that only captures http/https in plain text

Question

I current have str.match(/(http[^\s]+)/i) which not only captures link in the content, but also in img tag(src="http...") and anchor tag(href="http...")

How do I modify my regex so that it matches only "http/s" that has no "src=" or "href=" before it?

May be easiest to just get all text nodes first and search only those but it depends on what you're doing. — Explosion Pills
– Explosion Pills, Commented Apr 22, 2015 at 20:15
Maybe parsing HTML with regular expressions isn't a really good idea, and you should get the proper elements, then the text from those elements, before you use a regex ? — adeneo
– adeneo, Commented Apr 22, 2015 at 20:18

MaxZoom · Accepted Answer · 2015-04-23 01:35:31Z

3

You can use an additional \s. href or src will not have a whitespace character before the URL. In normal text, there is a whitespace.

str.match(/\s(http[^\s]+)/i)

Also see DEMO

edited Apr 23, 2015 at 1:35

MaxZoom

7,7635 gold badges30 silver badges45 bronze badges

answered Apr 22, 2015 at 20:15

ByteHamster

4,9719 gold badges41 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dmitry Sadakov · Accepted Answer · 2015-04-22 20:18:57Z

1

You can catch links that don't start with an = nor a quote before the http/s:

str.match(/[^=\"](http[^\s]+)/i)

answered Apr 22, 2015 at 20:18

Dmitry Sadakov

2,1583 gold badges19 silver badges34 bronze badges

Comments

Wiktor Stribiżew · Accepted Answer · 2015-04-22 20:54:12Z

0

You can overmatch using simple http[^\s]+ (=http\S+).

I'd suggest to use a regex to match text outside of tags, and whitelist those tags where you allow the text to appear. Here is the regex:

/(?![^<]*>|[^<>]*<\/(?!p\b|td|pre))https?:\/\/[a-z0-9&#=.\/\-?_]+/gi

(?!p\b|td|pre) part is where we add whitelisted tags. The regex won't capture http://example.com,.

See demo

answered Apr 22, 2015 at 20:54

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Collectives™ on Stack Overflow

Regex that only captures http/https in plain text

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related