In a string containing HTML I have several links that I want to replace with the pure href value:
from bs4 import BeautifulSoup
a = "<a href='www.google.com'>foo</a> some text <a href='www.bing.com'>bar</a> some <br> text'
soup = BeautifulSoup(html, "html.parser")
tags = soup.find_all()
for tag in tags:
if tag.has_attr('href'):
html = html.replace(str(tag), tag['href'])
Unfortunatly this creates some issues:
- the tags in the html use single quotes
', but beautifulsoup will create withstr(tag)an tag with"quotes (<a href="www.google.com">foo</a>). Soreplace()will not find the match. <br>get identified as<br/>. Againreplace()will not find the match.
So it seems using python's replace() method will not give reliable results.
Is there a way to use beautifulsoup's methods to replace a tag with a string?
edit:
Added value for str(tag) = <a href="www.google.com">foo</a>
'href'and not"href".