13

I want to parse some HTML in order to find the values of some attributes/tags etc.

What HTML parsers do you recommend? Any pros and cons?

3 Answers 3

12

NekoHTML, TagSoup, and JTidy will allow you to parse HTML and then process with XML tools, like XPath.

Sign up to request clarification or add additional context in comments.

1 Comment

XPath is the way for HTML parsing, it helps in case of bad formed HTML as well where regex fails.
7

I have tried HTML Parser which is dead simple.

3 Comments

I have used HTML parser on a project and it worked exactly as expected
but there is not much tutorials available...
I've noticed a lot of javascript snippets (and element attributes) creeping into my supposedly "text node" extractions. There have also been some cases where malformed HTML caused the whole parsing operation to fail. So I'm looking to replace the htmlparser library in my own project with something a little better.
1

Do you need to do a full parse of the HTML? If you're just looking for specific values within the contents (a specific tag/param), then a simple regular expression might be enough, and could very well be faster.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.