0

I was trying to scrape the data from a web page using Java Servlet, but I found out that the page is compressed. So when I make a URLConnection, it invokes to download the zipped file.

Can anyone help me with this? Actually, I would be visiting 1000s of pages like these, parse the table data using DOM and populate the database to make a query for some of the text words, and display the results. So I was wondering if this could make the process too slow.

Is there a way to do this without downloading the file? Any suggestions would be greatly appreciated. Thanks.

try{

        URL url = new URL("example.html.gz");
        URLConnection conn = url.openConnection();

         //FileInputStream instream= new FileInputStream(???What do I enter???);
         //GZIPInputStream ginstream =new GZIPInputStream(instream);
        conn.setAllowUserInteraction(false);
        InputStream urlStream = url.openStream();
        BufferedReader buffer = new BufferedReader(new InputStreamReader(urlStream));

        String t = buffer.readLine();
        while(t!=null){
            temp = temp + t ;
            t = buffer.readLine();
        }
1

1 Answer 1

2

Can you try this:

GZIPInputStream ginstream =new GZIPInputStream(conn.getInputStream());

The rest is same as your code.

Sign up to request clarification or add additional context in comments.

1 Comment

The constructor FileInputStream(InputStream) is undefined.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.