2

I am using commons-httpclient 3.1 to read a html page source. It is working fine with all except pages with content encoding as gzip. I am getting incomplete page source.

For this page firefox is showing content encoding as gzip.

Below are the details

Response header:

status code: HTTP/1.1 200 OK
Date = Wed, 20 Jul 2011 11:29:38 GMT
Content-Type = text/html; charset=UTF-8
X-Powered-By = JSF/1.2
Set-Cookie = JSESSIONID=Zqq2Tm8V74L1LJdBzB5gQzwcLQFx1khXNvcnZjNFsQtYw41J7JQH!750321853; path=/; HttpOnly
Transfer-Encoding = chunked
Content- length =-1

My code to read response :

HttpClient httpclient = new HttpClient();
            httpclient.getParams().setParameter("http.connection.timeout",
                    new Integer(50000000));
            httpclient.getParams().setParameter("http.socket.timeout",
                    new Integer(50000000));


        // Create a method instance.
        GetMethod method = new GetMethod(url);



        // Provide custom retry handler is necessary
        method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
                new DefaultHttpMethodRetryHandler(3, false));
        BufferedReader reader = null;
            // Execute the method.
            int statusCode = httpclient.executeMethod(method);

            if (statusCode != HttpStatus.SC_OK) {
                System.err.println("Method failed: "
                        + method.getStatusLine());
                strHtmlContent = null;
            } else {


                InputStream is = method.getResponseBodyAsStream();
                reader = new BufferedReader(new InputStreamReader(is,"ISO8859_8"));
                String line = null;
                StringBuffer sbResponseBody = new StringBuffer();
                while ((line = reader.readLine()) != null) {
                    sbResponseBody.append(line).append("\n");
                }
                strHtmlContent = sbResponseBody.toString();

2 Answers 2

1

Upgrade to httpclient 4.1. It should support compression seamlessly.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your reply. I tried by using httpclient 4.1 and I am getting not in gzip format exception.
Curious. The header section you posted in the question does not, in fact, specify gzip encoding. Are you sure it really is?
While trying I got below response: ---------------------------------------- Response is gzip encoded ---------------------------------------- Date = Fri, 22 Jul 2011 07:58:44 GMT Content-Encoding = gzip Content-Length = 5856 Content-Type = text/html; charset=UTF-8 X-Powered-By = JSF/1.2 Set-Cookie = JSESSIONID=9D2hTptKQ1PqKsMvHcYLyFTVlQ6fTNWK3VtcQcVmBHqFb9fSbvYL!750321853; path=/; HttpOnly content length =-1 content encoding=null Fatal transport error: Not in GZIP format java.io.IOException: Not in GZIP format
1

I just incurred in this issue, which I solved as follows:

    URL url = new URL("http://www.megadevs.com");
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();

    GZIPInputStream gzip = new GZIPInputStream(conn.getInputStream());
    int value = -1;
    String page = "";

    while ((value = gzip.read()) != -1) {
        char c = (char) value;
        page += c;
    }
    gzip.close();

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.