I am using python and want to remove all html tags from a string that aren't enclosed in certain tags. In this example, I want to remove all the html tags that aren't enclosed in the <header>tags</header> and also not remove that enclosing tag as well.
For example:
<h1>Morning</h1>
<header>
<h1>Afternoon</h1>
<h2>Evening</h2>
</header>
<h2>Night</h2>
Result:
Morning
<header>
<h1>Afternoon</h1>
<h2>Evening</h2>
</header>
Night
I've spent hours on it but no luck. I know that the following will find ALL tags:
re.sub('<.*?>', '', mystring)
And this will find anything within the header tags:
re.sub('<header>.*?</header>', '', mystring)
But how do I negate it, so that the first regex ignores what the second regex finds? Any help is greatly appreciated! Thank you! :)
pip install beautifulsoup4followed by running the code in my answer. :)