I have a string '<span>TEST1</span> <span>TEST2</span> <a href="#">TEST3</a>'
I need to remove html tags and leave the text
import re
p = re.compile( '\s*<[^>]+>\s*')
test = p.sub('', '<span>TEST1</span> <span>TEST2</span> <a href="#">TEST3</a>')
print(test)
OUTPUT: TEST1TEST2TEST3
But this removes every html element, how should I change regex so that the output would be like this:
OUTPUT: TEST1 TEST2 <a href="#">TEST3</a>
beautifulsoup?<...>You could change it to<(?!\/a>|a )[^>]+>regex101.com/r/YxTzLr/1