2

Oddly enough I haven't found anywhere that has answer this question specificly, all the other stack overflow things I've found aren't exactly right.

I have a body text I need to search through for image urls, this doesn't mean anything complex but basically things like:

http://www.google.com/logo.png

http://reddit.com/idfaiodf/test.jpg

NOT

http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE

All the regex I've used will include the "MORECONTENTHERE" in the results. It's frustrating as hell. I just want the URL with nothing appended after or added on before!

Also I don't want anything that does HTML image link extracting - I'm not pulling these from HTML.

Any regex to do this?

EDIT:

So here is what I'm using as a source: http://pastebin.com/dE2s1nHz

It's HTML but I didn't want to mention that because I didn't want people to do

3
  • If you're not pulling these from HTML please post an example of where you are getting them from. Without that it's going to be very difficult to avoid either trapping your third example, or not trapping your first two. Commented Aug 7, 2013 at 3:14
  • Ok, adding an example now Commented Aug 7, 2013 at 3:50
  • possible duplicate of PHP: Regular Expression to get a URL from a string Commented Aug 7, 2013 at 4:15

4 Answers 4

8
https?://[^/\s]+/\S+\.(jpg|png|gif)
  1. https? is "http" or "https"
  2. :// is literal
  3. [^/\s]+ is anything but a "/" or space
  4. / is literal
  5. \S+ is anything but a space
  6. \. is "."
  7. (jpg|png|gif) is image extensions, delimited by |

Result:

enter image description here

The above is taken from RegexBuddy, used in Wine on Mac. "PCRE" is equivalent to preg_* functions. Expression should work in most regular expression flavors.

Sign up to request clarification or add additional context in comments.

2 Comments

You haven't escaped the literal /'s
You do not need to escape / unless you use it as a delimiter in PHP's preg_* functions. See php.net/manual/en/regexp.reference.delimiters.php. The delimiters are not part of the expression, so they are omitted. It is quite common to see / as a delimiter, but if you use / in the pattern it is often best to avoid using it as a delimiter vs escaping it.
5

This matches a string ending with a known image extension.

<?php

    $string = "Oddly enough I haven't found anywhere that has answer this question specificly, all the other stack overflow things I've found aren't exactly right.

    I have a body text I need to search through for image urls, this doesn't mean anything complex but basically things like:

        http://www.google.com/logo.png

        http://reddit.com/idfaiodf/test.jpg

    NOT

        http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE
    ";

    $pattern = '~(http.*\.)(jpe?g|png|[tg]iff?|svg)~i';

    $m = preg_match_all($pattern,$string,$matches);

    print_r($matches[0]);

?>

Output

Array
(
    [0] => http://www.google.com/logo.png
    [1] => http://reddit.com/idfaiodf/test.jpg
    [2] => http://reddit.com/sadfasdf/test.jpg
)

1 Comment

The problem with this is that it will match any URL before the image up to and including the image URL. Try putting a link before an image and the match will extend to encapsulate both
3

Try following code:

$text = <<< EOD
http://www.google.com/logo.png
http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE
http://reddit.com/idfaiodf/test.jpg
EOD;

preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $text, $matches);
var_dump($matches[0]);

Comments

0
https?://[a-zA-Z0-9.]/[a-zA-Z0-9-&.]+\.(jpg|png|gif|tif|exf|svg|wfm)

I picked some arbitrary image types, and possibly missed a couple special characters allowed in URLs. Feel free to customize for your needs.

1 Comment

I think that will miss images that are not in the root directory. And domains with a dash.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.