-1

I need to copy all HTML code on the page.

I do so:

URL url = new URL(testurl);
URLConnection connection = url.openConnection();
connection.connect();
Scanner in = new Scanner(connection.getInputStream());
  while(in.hasNextLine()) 
   {
     htmlText=htmlText+in.nextLine(); 
    }
   in.close();

But if the page is large, it takes a lot of time.

Is there a faster method?

2
  • Did you tried with Jsoup library ? Commented Apr 29, 2014 at 14:33
  • how to keep HTML code?Jsoup parse just text Commented Apr 29, 2014 at 14:47

2 Answers 2

0

Have you tried a different method of reading the page? Like a buffered reader? Reading the content of web page or Reading entire html file to String?

I'm just thinking Scanner may be a little slow.

Tim

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, first link is helpful!
0

Try to use use (http://jsoup.org "JSoup") to download and parse the HTML from URL

You can get the HTML as document and read the text on each elements

 new AsyncTask<Void, Integer, String>(){
    @Override
    protected String doInBackground(Void... params) {
        try {
            final Document doc = Jsoup.connect("http://youturl.com").get();
            final String content;
            runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    // get the required text 
                   content = doc.body().getElementsByTag("bodyTag").text();

                }
            });

        } catch (IOException e) {
            e.printStackTrace();
        }
        return content;
    }
}.execute();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.