I made a simple code that maybe work for you, I used the class DOMDocument to parse the HTML string and get the main childNodes:
//Your HTML
$html = 'This <pre>is a <b>pen</b> and I like <i>it!</i></pre> Good <a>morning <pre>Mary</pre>!</a> Bye.';
$dom = new DomDocument;
$dom->loadHtml("<body>{$html}</body>");
$nodes = iterator_to_array($dom->getElementsByTagName('body')->item(0)->childNodes);
$nodesFinal = implode(
array_map(function($node) {
if ($node->nodeName === '#text') {
return $node->textContent;
}
return sprintf('<%1$s>%2$s</%1$s>', $node->nodeName, $node->textContent);
}, $nodes)
);
echo $nodesFinal;
Show me:
This <pre>is a pen and I like it!</pre> Good <a>morning Mary!</a> Bye.
Edit
In the next code I get solution for get the attrs in the tags and for UTF8 encoding in the html string:
//Your HTML
$html = '<a href="https://sample.com" target="_blank">Test simple <span>hyperlink.</span></a> This is a text. <div class="info class2">Simple div. <b>A value bold!</b>.</div> End with a some váúlé...';
$dom = new DomDocument;
$dom->loadHtml("<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/><body>{$html}</body>");
$nodes = iterator_to_array($dom->getElementsByTagName('body')->item(0)->childNodes);
$nodesFinal = implode(
array_map(function($node) {
$textContent = $node->nodeValue;
if ($node->nodeName === '#text') {
return $textContent;
}
$attr = implode(' ', array_map(function($attr) {
return sprintf('%s="%s"', $attr->name, $attr->value);
}, iterator_to_array($node->attributes)));
return sprintf('<%1$s %3$s>%2$s</%1$s>', $node->nodeName, $textContent, $attr);
}, $nodes)
);
echo $nodesFinal;
Show me:
<a href="https://sample.com" target="_blank">Test simple hyperlink.</a> This is a text. <div class="info class2">Simple div. A value bold!.</div> End with a some váúlé...
I used the meta tag for set the encoding and the property named attributes of the object DOMNode