9

I'm adding a #b hash to each link via the DOMDocument class.

        $dom = new DOMDocument();
        $dom->loadHTML($output);

        $a_tags = $dom->getElementsByTagName('a');

        foreach($a_tags as $a)
        {
            $value = $a->getAttribute('href');
            $a->setAttribute('href', $value . '#b');
        }

        return $dom->saveHTML();

That works fine, however the returned output includes a DOCTYPE declaration and a <head> and <body> tag. Any idea why that happens or how I can prevent that?

2

5 Answers 5

6

The real problem is the way the DOM is loaded. Use this instead:

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Please upvote the original answer here.

Sign up to request clarification or add additional context in comments.

Comments

5

That's what DOMDocument::saveHTML() generally does, yes : generate a full HTML Document, with the Doctype declaration, the <head> tag, ...

Two possible solutions :

  • If you are working with PHP >= 5.3, saveHTML() accepts one additional parameter that might help you
  • If you need your code to work with PHP < 5.3.6, you'll have to use some str_replace() or regex or whatever equivalent you can think of to remove the portions of HTML code you don't need.
    • For an example, see this note in the manual's users notes.

4 Comments

the second link works fine for me - preg_replace solution is the key! thank you!
You're welcome :-) (and the guys who post users notes on manual pages are more to be thanked than me, in this case ;-) )
I used the first option as I am using PHP >= 5.3 and it worked great. $doc->saveHTML(false);
@BenSinclair I am also using PHP >= 5.3 and $doc->saveHTML(false) throws the error <b>Warning</b>: DOMDocument::saveHTML() expects parameter 1 to be DOMNode, boolean given
2

Adding $doc->saveHTML(false); will not work and it will return a error because it expects a node and not bool.

The solution I used:

return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $doc->saveHTML()));

I`m using PHP >5.4

Comments

0

I solved this problem by creating new DOMDocument and copying child nodes from original to new one.

function removeDocType($oldDom) {
  $node = $oldDom->documentElement->firstChild
  $dom = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $dom->appendChild($doc->importNode($child, true));
  }
  return $dom->saveHTML();
}

So insted of using

return $dom->saveHTML();

I use:

return removeDocType($dom);

Comments

0

I was in the case where I want the html wrapper but not the DOCTYPE, the solution was in line with Tiago A.:

// Avoid adding the DOCTYPE header    
$dom->loadHTML($bodyContent, LIBXML_HTML_NODEFDTD);

// Avoid adding the DOCTYPE header AND html/body wrapper
$dom->loadHTML($bodyContent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.