2

I have a requirement wherein I store html text as string in python and want to compare them.

str1 = '<br> Example1'
str2 = '<br/>     Example1'

If I do a normal str1 == str2, it will be False. But in html they are equal. At the same time

str1 = '<br> Example1'
str2 = '<p> Example1'

is not html equal. Same goes with str2 = '<b> Example </b>' where str1!=str2

Are there any way to do it in python. I know the test case has self.assertInEmail which does html comparison, but I dont want to use test functions in my production code.

3
  • 4
    Split by whitespace and join again on the empty string? ''.join(str1.split()) == ''.join(str2.split()). Though this will quickly go wrong. If you want to fully compare HTML, you're probably better off using a library like BeautifulSoup. Commented Jan 8, 2016 at 7:53
  • I'd like to mention that those two strings are not equal even in HTML. The spaces are kept by browser and are accessible by JavaScript (example: pastebin.com/iP9rpEv0 ) so the presence of additional space character can actually affect what's going on. So, as @Evert said, you probably want to use BeautifulSoup (or any other HTML parser) and try to analyze if two DOMs are close enough to be treated as equal. Commented Jan 8, 2016 at 8:06
  • 2
    I think the question has been solved in how to using python to diff two html files Commented Jan 8, 2016 at 8:39

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.