What HTML parsing libraries do you recommend in Java [closed]

Question

I want to parse some HTML in order to find the values of some attributes/tags etc.

What HTML parsers do you recommend? Any pros and cons?

jelovirt · Accepted Answer · 2008-08-25 19:22:20Z

12

NekoHTML, TagSoup, and JTidy will allow you to parse HTML and then process with XML tools, like XPath.

answered Aug 25, 2008 at 19:22

jelovirt

5,9028 gold badges41 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sumit Ghosh Over a year ago

XPath is the way for HTML parsing, it helps in case of bad formed HTML as well where regex fails.

pek · Accepted Answer · 2008-08-25 18:55:11Z

7

I have tried HTML Parser which is dead simple.

answered Aug 25, 2008 at 18:55

pek

18.1k28 gold badges89 silver badges100 bronze badges

3 Comments

Craig Angus Over a year ago

I have used HTML parser on a project and it worked exactly as expected

Lily Over a year ago

but there is not much tutorials available...

benjismith Over a year ago

I've noticed a lot of javascript snippets (and element attributes) creeping into my supposedly "text node" extractions. There have also been some cases where malformed HTML caused the whole parsing operation to fail. So I'm looking to replace the htmlparser library in my own project with something a little better.

Herms · Accepted Answer · 2008-08-25 18:56:36Z

1

Do you need to do a full parse of the HTML? If you're just looking for specific values within the contents (a specific tag/param), then a simple regular expression might be enough, and could very well be faster.

answered Aug 25, 2008 at 18:56

Herms

39.2k13 gold badges80 silver badges105 bronze badges

Collectives™ on Stack Overflow

What HTML parsing libraries do you recommend in Java [closed]

3 Answers 3

1 Comment

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Linked

Related