0

I have a string that I need to remove the characters in the string between two other strings.

At the moment I have the following code, I'm not too sure why it doesn't work.

def removeYoutube(itemDescription):
    itemDescription = re.sub('<iframe>.*</iframe>','',desc,flags=re.DOTALL)
    return itemDescription

It doesn't remove the string in between and including and .

Example Input (String):

"<div style="text-align: center;"><iframe allowfullscreen="frameborder=0" height="350" src="https://www.youtube.com/embed/EKaUJExxmEA" width="650"></iframe></div>"

Expected Output: <div style="text-align: center;"></div>

As you can see from the output it should remove all of the parts containing <iframe></iframe>.

2
  • 1
    In general you get better answers if you provide sample input and the expected output as it reduced ambiguity. Commented Feb 14, 2021 at 14:10
  • There is no pattern <iframe> in the input. Only <iframe . Commented Feb 14, 2021 at 15:32

1 Answer 1

1

Use BeautifulSoup not regex, as regex is a poor choice for parsing a HTML. Here's why.

Here's how:

from bs4 import BeautifulSoup

sample = """
<div style="text-align: center;"><iframe allowfullscreen="frameborder=0" height="350" src="https://www.youtube.com/embed/EKaUJExxmEA" width="650"></iframe></div>
"""

s = BeautifulSoup(sample, "html.parser")

for tag in s.find_all(True):
    if tag.name == "iframe":
        tag.extract()
print(s)

Output:

<div style="text-align: center;"></div>
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the answer I don't know why I didn't think of that and thanks for linking the page as to why. Will be using this more in the future than using regex. Much appreciated :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.