1

I am reading a html content. There are image tags such as

<img onclick="document.location='http://abc.com'" src="http://a.com/e.jpg" onload="javascript:if(this.width>250) this.width=250">

or

<img src="http://a.com/e.jpg" onclick="document.location='http://abc.com'" onload="javascript:if(this.width>250) this.width=250" />

I tried to reformat this tags to become

<img src="http://a.com/e.jpg" />

However i am not successful. The codes i tried to build so far is like

$image=preg_replace('/<img(.*?)(\/)?>/','',$image);

anyone can help?

1
  • This is not a task for regular expressions. Use a HTML parser instead. Commented Jul 24, 2013 at 11:10

2 Answers 2

1

Here's a version using DOMDocument that removes all attributes from <img> tags except for the src attribute. Note that doing a loadHTML and saveHTML with DOMDocument can alter other html as well, especially if that html is malformed. So be careful - test and see if the results are acceptable.

<?php

$html = <<<ENDHTML
<!doctype html>
<html><body>
<a href="#"><img onclick="..." src="http://a.com/e.jpg" onload="..."></a>

<div><p>
<img src="http://a.com/e.jpg" onclick="..." onload="..." />
</p></div>
</body></html>
ENDHTML;

$dom = new DOMDocument;
if (!$dom->loadHTML($html)) {
    throw new Exception('could not load html');
}

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//img') as $img) {
    // unfortunately, cannot removeAttribute() directly inside
    // the loop, as this breaks the attributes iterator.
    $remove = array();
    foreach ($img->attributes as $attr) {
        if (strcasecmp($attr->name, 'src') != 0) {
            $remove[] = $attr->name;
        }
    }

    foreach ($remove as $attr) {
        $img->removeAttribute($attr);
    }
}

echo $dom->saveHTML();
Sign up to request clarification or add additional context in comments.

Comments

0

Match one at a time then concat string, I am unsure which language you are using so ill explain in pseudo:

1.Find <img with regex place match in a string variable
2.Find src="..." with src=".*?" place match in a string variable
3.Find the end /> with \/> place match in a string variable
4.Concat the variables together

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.