2

Is there a method to remove all nested html tags from a string except parent tags in php?

Example:

Input:

This <pre>is a <b>pen</b> and I like <i>it!</i></pre> Good <a>morning <pre>Mary</pre>!</a> Bye.

Output:

This <pre>is a pen and I like it!</pre> Good <a>morning Mary!</a> Bye.

1 Answer 1

1

I made a simple code that maybe work for you, I used the class DOMDocument to parse the HTML string and get the main childNodes:

//Your HTML
$html = 'This <pre>is a <b>pen</b> and I like <i>it!</i></pre> Good <a>morning <pre>Mary</pre>!</a> Bye.';

$dom = new DomDocument;
$dom->loadHtml("<body>{$html}</body>");

$nodes = iterator_to_array($dom->getElementsByTagName('body')->item(0)->childNodes);

$nodesFinal = implode(
    array_map(function($node) {
        if ($node->nodeName === '#text') {
            return $node->textContent;
        }
        return sprintf('<%1$s>%2$s</%1$s>', $node->nodeName, $node->textContent);
    }, $nodes)
);

echo $nodesFinal;

Show me:

This <pre>is a pen and I like it!</pre> Good <a>morning Mary!</a> Bye.

Edit

In the next code I get solution for get the attrs in the tags and for UTF8 encoding in the html string:

//Your HTML
$html = '<a href="https://sample.com" target="_blank">Test simple <span>hyperlink.</span></a> This is a text. <div class="info class2">Simple div. <b>A value bold!</b>.</div> End with a some váúlé...';


$dom = new DomDocument;
$dom->loadHtml("<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/><body>{$html}</body>");

$nodes = iterator_to_array($dom->getElementsByTagName('body')->item(0)->childNodes);

$nodesFinal = implode(
    array_map(function($node) {
        $textContent = $node->nodeValue;
        if ($node->nodeName === '#text') {
            return $textContent;
        }
        $attr = implode(' ', array_map(function($attr) {
            return sprintf('%s="%s"', $attr->name, $attr->value);
        }, iterator_to_array($node->attributes)));

        return sprintf('<%1$s %3$s>%2$s</%1$s>', $node->nodeName, $textContent, $attr);
    }, $nodes)
);

echo $nodesFinal;

Show me:

<a href="https://sample.com" target="_blank">Test simple hyperlink.</a> This is a text. <div class="info class2">Simple div. A value bold!.</div> End with a some váúlé... 

I used the meta tag for set the encoding and the property named attributes of the object DOMNode

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.