2

I have a little bit complex, dirty html code. Is there a good HTML Parser that i can use the HTML code as a Java Object?

e.g. I want access this Code:

<html>
  <body>
   <div id='foo'>
     <p id='bar'></p>
   </div>
  </body>
</html>

like via DOM:

[File/Code].getElementById('foo').appendText('bla');
[File/Code].getElement(Element.DIV).getElement(ELEMENT.P).getValue();
//etc...

have somebody an idea?

Or is there DOM in Java (this does not help :()?

Greetings

2
  • 1
    Take a look at JSoup Commented Aug 4, 2013 at 19:13
  • yes, jsoup is what i searched :) Commented Aug 6, 2013 at 18:09

1 Answer 1

5

Just give http://jsoup.org/ a try. It can handle very broken html.

Example:

public static void main(String[] args)
{
    Document document = Jsoup.parse("<html>" +
            "  <body>" +
            "   <div id='foo'>" +
            "     <p id='bar'>TEST</p>" +
            "   </div>" +
            "  </body>" +
            "</html>");

    System.out.println("Add blah to the Element with ID: foo");
    Element foo = document.getElementById("foo");
    foo.appendText("blah");

    System.out.println(document.html());

    System.out.println("Get the content of a div having a p:");
    for (Element div : document.getElementsByTag("div"))
    {
        for (Element p : div.getElementsByTag("p"))
        {
            System.out.println(p.text());
        }

    }
}

Maven

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.2</version>
</dependency>
Sign up to request clarification or add additional context in comments.

2 Comments

@criztovyl_needs_help do you need more information to accept this answer?
i accept this answer withou more infomations, using jsoup now :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.