1

I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol?

try {
   Jsoup.connect("google.com").get();
} catch (IOException ex) {
   Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

But I get the error:

java.lang.IllegalArgumentException: Malformed URL: google.com

What can I do? Are there any classes or libraries that do this?

What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. I am writing a Java program to test if the link is broken or not.

4
  • No, there is no such class in the JDK and I doubt any library allows that. You should be clear about what you want. HTTP and HTTPS are very different. Commented Mar 27, 2014 at 0:09
  • 1
    Just prefix http:// or https:// before the URL? Commented Mar 27, 2014 at 0:13
  • 1
    I don't know you usecase, but try using http, it should be OK.. Most sites implement url redirection. I agree with the above comment though. Commented Mar 27, 2014 at 0:15
  • What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. I am writing a Java program to test if the link is broken or not. Commented Mar 27, 2014 at 0:35

1 Answer 1

1

You can write your own simple method to try both protocols, like:

static boolean usesHttps(final String urlWithoutProtocol) throws IOException {
    try {
        Jsoup.connect("http://" + urlWithoutProtocol).get();
        return false;
    } catch (final IOException e) {
        Jsoup.connect("https://" + urlWithoutProtocol).get();
        return true;
    }
}

Then, your original code can be:

try {
    boolean shouldUseHttps = usesHttps("google.com");
} catch (final IOException ex) {
    Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

Note: you should only use the usesHttps() method once per URL, to figure out which protocol to use. After you know that, you should connect using Jsoup.connect() directly. This will be more efficient.

Sign up to request clarification or add additional context in comments.

1 Comment

I would not suggest such a method for every day use, since raising an exception is a costly operation. It is better to detect access mode once and a next time connect using protocol detected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.