4

Is it possible for a RegEx to clean up whitespace in HTML?

For example:

<p><b>foo</b> <i>bar</i></p>
<p>foo</p> <p>bar</p>

On the first line, the space between the closing b and opening i tag is valid (although it could be a &nbsp;), however on the second line it is whitespace that I wish to clean up as it shouldn't have any semantic value.

Perhaps this would be better solved with DOM traversal?

0

2 Answers 2

5

Seems like something like HTML Tidy would be a better bet for what you're looking for - rather than needing to re-create all the potentially complex rules (such as your first whitespace in the example being significant, but not the 2nd, etc.)

Otherwise, I agree - DOM traversal would be a much better approach than regular expressions - especially if your HTML is already XHTML compliant and can be easily traversed as XML.

Sign up to request clarification or add additional context in comments.

3 Comments

I had a quick hunt around for a javascript implementation of HTML Tidy, but no luck, so DOM traversal it is. I need this to run as fast as possible, so hopefully IE won't cause too many issues.
Why would you bother doing this in JavaScript? Who will it benefit? All of the HTML will already have been transmitted to the client and rendered by the browser, so you won't be saving anything in terms of bandwidth or browser rendering.
Little extra background then, I'm fiddling around with a contenteditable div and the diff-match-patch library. Depending on browser implementation of contenteditable, I get different amounts of whitespace, resulting in never ending diffs as I go back and forth to the different browsers. Easiest workaround I can think of is to strip the whitespace entirely.
0

First I have to quote ;) "asking regexes to parse arbitrary HTML is like asking Paris Hilton to write an operating system" Then back to the business. You could try different regexes to tags (although, I'd doubt this is valid method):

sed -e 's/<p>\ </<p></g'

That removes <p>(whitespace)<(whatever_tag) whitespace.

Otherwise, I too agree with the DOM traversal.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.