Getting URL's from string

Question

I have a string containing a HTML document and I want to extract all URL's from it. I tried this:

preg_match_all('/(http:\/\/){1}.{1,}\..{1,}/', $html_document /* a valid document, containing a lot of links*/, $matches);
print_r($matches);

But instead of array containing all links, I get parts of HTML code.
What's wrong with my code?

{1,} allows for "one to infinite" matches. if your text has two or more urls, you're allowing a match of ALL the text between those two urls. or even two / will do it: foo http://example.com/ this is some filler text with a . and /" will capture the "this is some filler text" — Marc B
– Marc B, Commented Aug 12, 2014 at 16:45
See What is the best regular expression to check if a string is a valid URL? — Braj
– Braj, Commented Aug 12, 2014 at 16:45

Braj · Accepted Answer · 2014-08-12 17:11:58Z

1

If you are interested in extracting the url instead of validating it then try below regex:

\bhttps?:\/\/[^\s]*

sample code:

$re = "/\\bhttps?:\\/\\/[^\\s]*/im";
$str = "http://www.regex101.com https://www.stachoverflow.com";

preg_match_all($re, $str, $matches);

answered Aug 12, 2014 at 17:11

Braj

46.9k5 gold badges63 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1