XPath in PHP Removes HTML Tags

Question

I am using XPath in PHP to retrieve part of an HTML document. Assume that my HTML document looked like this:

<html>
    <head>
    </head>
    <body>
        <div id="first">
            <a href="some_link_address.com">Hello</a>
            <p>Some text here</p>
        </div>
        <div id="second">
            <p>Some other text here</p>
            <img src="src/to/image.jpg" />
        </div>
    </body>
</html>

And my PHP including the XPath call is:

$result_dom = new DOMDocument('1.0', 'utf-8');
$node_to_keep = $xpath->query("//div[@id='first']");

foreach ($nodes_to_keep as $node) {

    $element = $result_dom->createElement('div', $node->nodeValue;);
    $result_dom ->appendChild($element);
}

I was expecting that the resulting dom will contain the following

<div>
    <a href="some_link_address.com">Hello</a>
    <p>Some text here</p>
</div>

However this is the resulting dom

<div>
    Hello
    Some text here
</div>

So my question is, how do I set the resulting dom to contain the html tags. I do not want them removed

Thanks.

rich remer · Accepted Answer · 2013-12-08 02:27:20Z

2

The "nodeValue" of an element is the textual content of that element. The text nodes in the document do not include the <a ...>, etc., just the text inside and between those elements. So, this is all you get in the new element.

Instead of creating a node manually, import a deep copy of the result node and append that:

$importedNode = $result_dom->importNode($node, true);
$result_dom->appendChild($importedNode);

answered Dec 8, 2013 at 2:27

rich remer

3,6162 gold badges38 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

XPath in PHP Removes HTML Tags

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related