8

I know you may think this question is stupid, but I need to use HtmlUnit. However, it returns a page either as XML or as text.

I don't how to get the pure HTML (the same as the source code that browsers return)

I need this, because I need to use some written modules. Any ideas?

2
  • mr. Vai asks if you can "provide fullcode which extracts webpage using HTMLUNIT" Commented Feb 17, 2013 at 18:33
  • I have save problem , Can u help me ? stackoverflow.com/questions/20781322/… Commented Dec 26, 2013 at 10:52

1 Answer 1

25

You can use the following piece of code to achieve your goal:

WebClient webClient = new WebClient();
Page page = webClient.getPage("http://example.com");
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

See javadocs of the WebResponse.html#getContentAsString() method.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! :) I found it just before seeing your comment!
but there is a problem, it doesn't show the texts in <nonscript> tags!
webClient.getOptions().setJavaScriptEnabled(true) - Add this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.