extract all URLs in a free text block using RegEx [duplicate]

Question

I'm attempting to detect all URLs listed in a free text block. I'm using the .nets Regex.Matches call.. with the following regex: (http|https)://[^\s "']{4,}

Now, I've put in the following text:
here is a link http://somelink.com
here is a link that I didn't space withhttp://nospacelink.com/something?something=&39358235
http://nospacelink.com/something?something=&12233454
here is a link I already handled. Here is some secret t&cs you're not allowed to know https://somethingbad.com
Just to be a little annoying I've put in a new address thingy capture type of 'http://somethinginspeechmarks.com' and what are you going to do now?
here is a link http://postTextLink.com at then some post text
Here is a link with a full stop http://alinkwithafullstoplink.com. And then some more.

and I get the following output:

http://somelink.com
http://nospacelink.com?something=&39358235
http://nospacelink.com?something=&12233454
http://alreadyhandledlink.com
https://somethingbad.com
http://somethinginspeechmarks.com
http://postTextLink.com
http://alinkwithafullstoplink.com.

Please notice the full stop on the last entry. How can I update my regex to say "If there is a full stop at the end, please ignore it?"

Also, please note that "Getting parts of a URL (Regex)" has nothing to do with my question, as that question is about how to break down a particular URL. I want to extract multiple, complete urls. Please see my input and current outputs for clarification! I have got a regex already that does most of what I want, but isn't quite right. Could you please explain where my approach might be improved?

Loving that I can't just mark my own question as duplicate lol — Immortal Blue
– Immortal Blue, Commented May 16, 2014 at 13:51
Change to (http|https)://[^\s "']{4,}(?<!\.) - added (?<!\.) in the end. — Ulugbek Umirov
– Ulugbek Umirov, Commented May 16, 2014 at 14:05
@Kilazur, I was meaning I could only vote it as a duplicate, as apposed to just closing it as duplicate... — Immortal Blue
– Immortal Blue, Commented May 16, 2014 at 14:11
@smerny, could you provide an example of a url which wouldn't pass with (http|https)://[^\s "']{4,}[^\.\s"']+? — Immortal Blue
– Immortal Blue, Commented May 16, 2014 at 14:14
@ImmortalBlue, that regex isn't in the answer marked as duplicate — Smern
– Smern, Commented May 16, 2014 at 14:48

kulssaka · Accepted Answer · 2014-05-16 14:07:11Z

1

I would add something like [^\.] to the pattern.

This pattern says that the last char can't be a full stop.

So for (http|https)://[^\s "']{4,}[^\.] it will try to match all adresses not ending with a full stop.

Edit:

This one should be better as said in comments: [^.\s"']

edited May 16, 2014 at 14:07

answered May 16, 2014 at 13:56

kulssaka

2369 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Smern Over a year ago

This would actually match http://alinkwithafullstoplink.com. (with an extra space at the end) as well as http://somethinginspeechmarks.com' (the quote mark)

kulssaka Over a year ago

Exact ! then [^\.\s"']+

Smern Over a year ago

User wants dots, just not at the end.

halfer · Accepted Answer · 2022-04-17 09:03:26Z

-1

Updated:

Consider the following minor change to your pattern:

(http|https)://[^\s "']{4,}(?=\.)

edited Apr 17, 2022 at 9:03

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered May 16, 2014 at 14:04

gpmurthy

2,42721 silver badges21 bronze badges

4 Comments

Immortal Blue Over a year ago

that stops at any ., so gives the output of http://somelink and http://nospacelink etc...

gpmurthy Over a year ago

Fixed the pattern. Try that for size.

Immortal Blue Over a year ago

That still returns the . at the end... http://linkwithafullstop.com. http://alinkwithafullstoplink.com.

gpmurthy Over a year ago

Made a minor change to the pattern... . to \.

Collectives™ on Stack Overflow

extract all URLs in a free text block using RegEx [duplicate]

2 Answers 2

3 Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Linked

Related