I need to parse an HTML string and remove all the elements which contain only empty children.
Example:
<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>
contains no information and must be replaced with </br>
I wrote a regex like this:
<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>
but the problem is that it's catching only 2 levels of the three. In the abobe example, the <p> element (the outer-most one) is not selected.
Can you help me fix this regex?