0

I would like to find a regular expression that could find (in given HTML) the following images:

  • Those captured in: src=""
  • Those captured in: src=''
  • Those captured in: background=""
  • Those captured in: background=''
  • Those captured in: url("")
  • Those captured in: url('')
  • Those captured in: url()

So far i came up with:

preg_match_all("/src=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/background=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/url\((\"|'|)?((.*\.(png|gif|jpg))(\"|'|))\)/Ui", $strHTML, $arrMatches);

But those are incomplete in that they don't include the prefix (src/background/url). Also, security wise I think they can be improved further, to prevent somebody from entering src="http://somesite.com/someurl.exe?ext=jpg"

Any help in the right direction is appreciated.

edit:

I think i got it, although the code can surely be improved, possibly even combined and/or optimized :)

/* match CSS url() links */

preg_match_all("/(url\((\"|'|)(.*\.(png|gif|jpg|jpeg))(\"|'|)\))/Ui", $strHTML, $arrMatches);

Array
(
    [0] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [1] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [2] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

    [3] => Array
        (
            [0] => test1.gif
            [1] => test2.gif
            [2] => test3.gif
        )

    [4] => Array
        (
            [0] => gif
            [1] => gif
            [2] => gif
        )

    [5] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

)

/* match img links */
preg_match_all("/(src=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);

/* match background links */
preg_match_all("/(background=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);
5
  • Can you clarify by posting your expected output? I'm not sure what you mean by "they don't include the prefix". Also, its difficult to give advice on security with no context for how the code is used. But I can say that you should not rely on a regular expression to prevent malicious code from being injected to your application. Commented Feb 11, 2012 at 9:09
  • possible duplicate of Grabbing the href attribute of an A element Commented Feb 11, 2012 at 9:22
  • possible duplicate of Parse Inline CSS Values with Regex Commented Feb 11, 2012 at 9:24
  • The reason i am asking this is that i would like to find all images in HTML using above tags, and replace them with "cid:". I got that part covered already. So i would like to replace src="/relative/path/img.jpg" but NOT src="http://somesite/relative/path/img.jpg and all variants, therefore, in the expected result i would prefer seeing an array which not only contains the url (e.g. /relative/path/img.jpg) but also src="/relative/path/img.jpg" Commented Feb 11, 2012 at 9:50
  • so i can replace that, but leaving the HTTP url alone (since they share '/relative/path/img.jpg') meaning i would end up with: src="http://somesite/cid:..." which obviously won't work. Commented Feb 11, 2012 at 9:52

1 Answer 1

4

If you're sure about those attribute names (src,url and background)...

$arr = array(
    'url("http://somesite.com/someurl.exe?src=jpg")',
    'url(http://somesite.com/someurl.exe?src=jpg)',
    'src="http://somesite.com/someurl.exe?src=jpg"',
    'src="http://somesite.com/someurl.exe?ext=jpg"',
    'background="http://somesite.com/someurl.exe?src=jpg"'
);
foreach ($arr as $str) {
    preg_match_all('/(?<=src=|background=|url\()(\'|")?(?<image>.*?)(?=\1|\))/i',$str,$matches);
    echo $str;
    foreach($matches['image'] as $img) {
        echo "\nimage: <b>$img</b>\n";
    }
    echo "\n";
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.