0

I am trying to write a program in java which is able to download a file from a URL. I want to do this without using an URLConnection, instead i am just using TCP sockets. I have succeeded in sending the GET request and picking up the server's response, but i can't seem to get my head around saving the file from the response without the http-header(just the file).

import java.net.*;
import java.io.*;

public class DownloadClient {
    public static void main(String[] args) {
        try {
            if (args.length != 3) {
                System.out.println(
                    "Use: java DownloadClient <host> <port> <filename/path>"
                );
            } else {
                // Sorting out arguments from the args array
                String host;
                int port; 
                String filename;
                if (args[0].charAt(args[0].length()-1) == '/') {
                    host = args[0].substring(0,args[0].length()-1);
                } else {
                    host = args[0];
                }
                port = Integer.parseInt(args[1]);
                if (args[2].charAt(0) == '/') {
                    filename = args[2];
                } else {
                    filename = "/"+args[2];
                }

                Socket con = new Socket(args[0], Integer.parseInt(args[1]));

                // GET request
                BufferedWriter out = new BufferedWriter(
                    new OutputStreamWriter(con.getOutputStream(), "UTF8")
                );
                out.write("GET "+filename+" HTTP/1.1\r\n");
                out.write("Host: "+host+"\r\n");
                out.write("User-Agent: Java DownloadClient\r\n\r\n");
                out.flush();

                InputStream in = con.getInputStream();
                BufferedReader = 
                OutputStream outputFile = new FileOutputStream(
                    filename.substring(filename.lastIndexOf('/')+1)
                );
                byte[] buffer = new byte[1024];
                int bytesRead = 0;

                while((bytesRead = in.read(buffer)) > 0) {
                    outputFile.write(buffer, 0, bytesRead);
                    buffer = new byte[1024];
                }

                outputFile.close();
                in.close();
                con.close();
            }
        } catch (IOException e) {
            System.err.println(e); 
        }
    }
}

I guess that i should somehow look for \r\n\r\n as it indicates the empty line just before the content begins. So far this program creates a file which contains all of the http-response.

1

1 Answer 1

3

The recommended way to do this is to NOT try to talk to a web server using a plain Socket. Use one of the existing client-side HTTP stack; e.g. the standard HttpUrlConnection stack or the Apache HttpClient stack.

If you insist on talking using a plain socket, then it is up to you to process / deal with the "Header" lines in any response ... and everything else ... in accordance with the HTTP specification.

I guess that I should somehow look for \r\n\r\n as it indicates the empty line just before the content begins.

Yup ...

And you also potentially need to deal with the server sending a compressed response, an response using an unexpected character set, a 3xx redirect, and so on.

Sign up to request clarification or add additional context in comments.

2 Comments

Well i am taking a course in web technologies, so actually this is just experimental. But how can i exactly look for \r\n\r\n? I am fairly new to programming, just began this summer.
You need to examine the bytes (either as you read them or before you write them to the file) and look for the sequence of bytes that mean "\r\n\r\n". It is just programming ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.