0

I have some HTML that contains (among other things) p-tags and figure-tags that contain one img-tag.
For the sake of simplicity I'll define an example of what can be found in the HTML here in a PHP variable:

$content = '<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>';

I use DOMDocument to get $content and in this example I'll change the src attribute of all img-elements within a figure-element:

$dom = new DOMDocument();
libxml_use_internal_errors(true);

// this needs to be encoded otherwise special characters get messed up.
$domPart = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($domPart, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$domFigures = $dom->getElementsByTagName('figure');

foreach ($domFigures as $domFigure) {

    $img = $domFigure->getElementsByTagName('img')[0];
    if ($img) {
        $img->setAttribute('src', "https://placekitten.com/g/400/500");
    }

}

$result = $dom->saveHTML();

The result is:

<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/400/500">
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
</figure>

Somehow my p-element has moved into my figure-element. Why does this happen and what can I do to prevent it?

Live DEMO

2 Answers 2

1

A DomDocument has to have a single root element, so it will move all following siblings inside the first top-level element.

You could most easily address this by bookending your content with a container tag e.g.

$content = '<div><figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p></div>';
Sign up to request clarification or add additional context in comments.

Comments

1

The re-arrangement is done by the LIBXML_HTML_NOIMPLIED option you're using. Looks like it's not stable enough for your case.

Look at this answer : loadHTML LIBXML_HTML_NOIMPLIED on an html fragment generates incorrect tags And How to saveHTML of DOMDocument without HTML wrapper?

Note : PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.