16
sock = new Socket("www.google.com", 80);
       out  = new BufferedOutputStream(sock.getOutputStream());
       in   = new BufferedInputStream(sock.getInputStream());

When i try to do printing out of content inside "in" like below

 BufferedInputStream bin = new BufferedInputStream(in);
 int b;
 while ( ( b = bin.read() ) != -1 )
 {

     char c = (char)b;         

     System.err.print(""+(char)b); //This prints out content that is unreadable.
                                   //Isn't it supposed to print out html tag?
 }
4
  • Please show a short but complete example. You haven't shown how you're sending Google the request. If you specify that you can handle gzipped data, for example, you'd need to decompress the output first. Commented Jul 5, 2009 at 8:40
  • (Also note that your current code is effectively assuming ISO-Latin-1.) Commented Jul 5, 2009 at 8:45
  • hi, after i open new Socket(); i do a "get index.html" and send it over to "out" follow by trying to get the "in" like the code above. i didnt specified handle gzipped, how to find out whether it's gzipped? Commented Jul 5, 2009 at 9:31
  • If the contents are gzipped, it will be stated in the header (which wont be). HTTP 0.9 syntax doesn't tend to work any more. You need something like "GET /index.html HTTP1.0\r\n\r\n" or better "GET /index.html HTTP1.1\r\nHost: www.google.com\r\n\r\n" (IIRC). Commented Jul 5, 2009 at 11:33

3 Answers 3

20

If you want to print the content of a web page, you need to work with the HTTP protocol. You do not have to implement it yourself, the best way is to use existing implementations such as the java API HttpURLConnection or Apache's HttpClient

Here is an example of how to do it with HttpURLConnection:

URL url = new URL("http","www.google.com");
HttpURLConnection urlc = (HttpURLConnection)url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.setRequestMethod("GET");
urlc.connect();
// check you have received an status code 200 to indicate OK
// get the encoding from the Content-Type header
BufferedReader in = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
  System.out.println(line);
}

// close sockets, handle errors, etc.

As written above, you can save traffic by adding the Accept-Encoding header and check the Content-Encoding header of the response.

Here is an HttpClient Example, taken from here:

   // Create an instance of HttpClient.
    HttpClient client = new HttpClient();

    // Create a method instance.
    GetMethod method = new GetMethod(url);

    // Provide custom retry handler is necessary
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
            new DefaultHttpMethodRetryHandler(3, false));

    try {
      // Execute the method.
      int statusCode = client.executeMethod(method);

      if (statusCode != HttpStatus.SC_OK) {
        System.err.println("Method failed: " + method.getStatusLine());
      }

      // Read the response body.
      byte[] responseBody = method.getResponseBody();

      // Deal with the response.
      // Use caution: ensure correct character encoding and is not binary data
      System.out.println(new String(responseBody));

    } catch (HttpException e) {
      System.err.println("Fatal protocol violation: " + e.getMessage());
      e.printStackTrace();
    } catch (IOException e) {
      System.err.println("Fatal transport error: " + e.getMessage());
      e.printStackTrace();
    } finally {
      // Release the connection.
      method.releaseConnection();
    }  
Sign up to request clarification or add additional context in comments.

2 Comments

+1 for the HttpClient in particular. As soon as you want to do anything beyond a simple GET, it's invaluable
HttpURLConnection doesn't handle gzipped content. I learned that the hard way.
16

Very easy to create a String from a Stream using Java 8 Stream API:

new BufferedReader(new InputStreamReader(in)).lines().collect(Collectors.joining("\n"))

Using IntelliJ I even can set this beeing a debug expression: enter image description here

I guess in Eclipse it will work similar.

Comments

1

If you what to fetch the content of a webpage, you should take a look at apache httpclient instead of coding this yourself, expect for learning purposes or any other really good reason.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.