2
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName("div");
foreach ( $divs as $div ) {
    if ( $class = $div->attributes->getNamedItem("class") ) {
        if ( $class->nodeValue == "simplegalleryholder" ) 
            $div->parentNode->removeChild( $div );
    }
}
$content = $dom->saveHTML();

This simple code should help me with removing

<div class="simplegalleryholder"> .... </div> 

from the document. The only problem is, that $content contains utf8 encoded special characters (ąęść etc), that are destroyed by proces (i get iÄ™ Å‚ ż instead).

How should I approach this issue to get correct result?

1 Answer 1

6

Specifying UTF-8 in the constructor doesn't make the underlying xml processing library process it as utf8. The following workaround is really hacky, but its works reasonably well.

$encodingHint = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">';
$dom->loadHTML($encodingHint . $html);

https://bugs.php.net/bug.php?id=32547

If you're viewing the output in a web browser, send a real http header, not an http-equiv meta tag. This is only for viewing. processing with domdocument specifically needs the meta tag.

header('content-type: text/html; charset=utf-8');
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.