0

I have a string containing a HTML document and I want to extract all URL's from it. I tried this:

preg_match_all('/(http:\/\/){1}.{1,}\..{1,}/', $html_document /* a valid document, containing a lot of links*/, $matches);
print_r($matches);

But instead of array containing all links, I get parts of HTML code.
What's wrong with my code?

4
  • 1
    {1,} allows for "one to infinite" matches. if your text has two or more urls, you're allowing a match of ALL the text between those two urls. or even two / will do it: foo http://example.com/ this is some filler text with a . and /" will capture the "this is some filler text" Commented Aug 12, 2014 at 16:45
  • 1
    See What is the best regular expression to check if a string is a valid URL? Commented Aug 12, 2014 at 16:45
  • possible duplicate of Extract URLs from text in PHP Commented Aug 12, 2014 at 16:46
  • Do you want to validate or just want to extract it? Commented Aug 12, 2014 at 16:47

1 Answer 1

1

If you are interested in extracting the url instead of validating it then try below regex:

\bhttps?:\/\/[^\s]*

Here is online demo

sample code:

$re = "/\\bhttps?:\\/\\/[^\\s]*/im";
$str = "http://www.regex101.com https://www.stachoverflow.com";

preg_match_all($re, $str, $matches);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.