115

I can't understand why Java's HttpURLConnection does not follow an HTTP redirect from an HTTP to an HTTPS URL. I use the following code to get the page at https://httpstat.us/:

import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;

public class Tester {

    public static void main(String argv[]) throws Exception{
        InputStream is = null;

        try {
            String httpUrl = "http://httpstat.us/301";
            URL resourceUrl = new URL(httpUrl);
            HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
            conn.setConnectTimeout(15000);
            conn.setReadTimeout(15000);
            conn.connect();
            is = conn.getInputStream();
            System.out.println("Original URL: "+httpUrl);
            System.out.println("Connected to: "+conn.getURL());
            System.out.println("HTTP response code received: "+conn.getResponseCode());
            System.out.println("HTTP response message received: "+conn.getResponseMessage());
       } finally {
            if (is != null) is.close();
        }
    }
}

The output of this program is:

Original URL: http://httpstat.us/301
Connected to: http://httpstat.us/301
HTTP response code received: 301
HTTP response message received: Moved Permanently

A request to http://httpstat.us/301 returns the following (shortened) response (which seems absolutely right!):

HTTP/1.1 301 Moved Permanently
Cache-Control: private
Content-Length: 21
Content-Type: text/plain; charset=utf-8
Location: https://httpstat.us

Unfortunately, Java's HttpURLConnection does not follow the redirect!

Note that if you change the original URL to HTTPS (https://httpstat.us/301), Java will follow the redirect as expected!?

1
  • 1
    Hi, I edited your question for clarity and to point out the the redirect to HTTPS in particular is the problem. Also, I changed the bit.ly domain to a different one, as use bit.ly is blacklisted in questions. Hope you don't mind, feel free to re-edit. Commented Sep 26, 2019 at 14:20

5 Answers 5

136

Redirects are followed only if they use the same protocol. (See the followRedirect() method in the source.) There is no way to disable this check.

Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.

For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks. I've just found confiramtion: bugs.sun.com/bugdatabase/view_bug.do?bug_id=4620571 . Namely: "After discussion among Java Networking engineers, it is felt that we shouldn't automatically follow redirect from one protocol to another, for instance, from http to https and vise versa, doing so may have serious security consequences. Thus the fix is to return the server responses for redirect. Check response code and Location header field value for redirect information. It's the application's responsibility to follow the redirect."
But does it follow redirect from http to http or https to https? Even that would be wrong. Isn't it?
@JoshuaDavis Yes, it only applies to redirects to the same protocol. An HttpURLConnection won't automatically follow redirects to a different protocol, even if the redirect flag is set.
Java Networking engineers could offer a setFollowTransProtocol(true) option because if we need it we will program it anyway. FYI web browsers, curl and wget and may more follow redirects from HTTP to HTTPS and vice-versa.
Nobody sets up auto-login on HTTPS and then expects HTTP to be "anonymous". That's nonsensical. It's perfectly safe and normal to follow redirects from HTTP to HTTPS (not the other way around). This is just a typically bad Java API.
|
68

HttpURLConnection by design won't automatically redirect from HTTP to HTTPS (or vice versa). Following the redirect may have serious security consequences. SSL (hence HTTPS) creates a session that is unique to the user. This session can be reused for multiple requests. Thus, the server can track all of the requests made from a single person. This is a weak form of identity and is exploitable. Also, the SSL handshake can ask for the client's certificate. If sent to the server, then the client's identity is given to the server.

As erickson points out, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

The programmer has to take extra steps to ensure that credentials, client certificates or SSL session id will not be sent before redirecting from HTTP to HTTPS. The default is to send these. If the redirection hurts the user, do not follow the redirection. This is why automatic redirect is not supported.

With that understood, here's the code which will follow the redirects.

  URL resourceUrl, base, next;
  Map<String, Integer> visited;
  HttpURLConnection conn;
  String location;
  int times;

  ...
  visited = new HashMap<>();

  while (true)
  {
     times = visited.compute(url, (key, count) -> count == null ? 1 : count + 1);

     if (times > 3)
        throw new IOException("Stuck in redirect loop");

     resourceUrl = new URL(url);
     conn        = (HttpURLConnection) resourceUrl.openConnection();

     conn.setConnectTimeout(15000);
     conn.setReadTimeout(15000);
     conn.setInstanceFollowRedirects(false);   // Make the logic below easier to detect redirections
     conn.setRequestProperty("User-Agent", "Mozilla/5.0...");

     switch (conn.getResponseCode())
     {
        case HttpURLConnection.HTTP_MOVED_PERM:
        case HttpURLConnection.HTTP_MOVED_TEMP:
           location = conn.getHeaderField("Location");
           location = URLDecoder.decode(location, "UTF-8");
           base     = new URL(url);               
           next     = new URL(base, location);  // Deal with relative URLs
           url      = next.toExternalForm();
           continue;
     }

     break;
  }

  is = conn.openStream();
  ...

6 Comments

This is only one solution that works for more than 1 redirects. Thank you!
This works beautifully for multiple redirects (HTTPS API -> HTTP -> HTTP image)! Perfect simple solution.
@Nathan - thanks for the details, but I still don't buy it. For instance, if's under the control of the client whether any credentials or client certs are sent. If it hurts, don't do it (in this case, do not follow the redirect).
I only don't understand the location = URLDecoder.decode(location... part. This decodes a working encoded relative part (with space=+ in my case) into a non-working one. After I removed it, it was OK for me.
@Niek I am not sure why you do not need it but I do.
|
8

As mentioned by some of you above, the setFollowRedirect and setInstanceFollowRedirects only work automatically when the redirected protocol is same . ie from http to http and https to https.

setFolloRedirect is at class level and sets this for all instances of the url connection, whereas setInstanceFollowRedirects is only for a given instance. This way we can have different behavior for different instances.

I found a very good example here http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/

Comments

6

Another option can be to use Apache HttpComponents Client:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

Sample code:

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("https://media-hearth.cursecdn.com/avatars/330/498/212.png");
CloseableHttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream is = entity.getContent();

Comments

-5

HTTPUrlConnection is not responsible for handling the response of the object. It is performance as expected, it grabs the content of the URL requested. It is up to you the user of the functionality to interpret the response. It is not able to read the intentions of the developer without specification.

2 Comments

Why it has setInstanceFollowRedirects in this case? ))
My guess is that it was a suggested feature to add in later, it makes sense.. my comment was more of reflected toward... the class is designed to go and grab web content and bring it back... people may want to get non HTTP 200 messages.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.