1

I'm writing some code to find absolute URLS of a single webpage:

http://explore.bfi.org.uk/4ce2b69ea7ef3

So far I get all the links of that page and print the absolute urls

Here is part of the code:

 Elements hyperLinks = htmlDoc.select("a[href]");

    for(Element link: hyperLinks)
    {
        System.out.println(link.attr("abs:href"));
    }

This prints out alot or urls just like the one above. However, it seems to skip a few URLS aswell. The ones it skips are the ones I actually need.

This is one of the a[href] elements its not turning into the absolute URL:

<div class="title"><a href="/4ce2b69ea7ef3">Royal Review</a><br /></div>

It will print this line if I just print "link" but when I put "abs:href", it will just print blank.

I am new to Java and appreciate any feedback!

1 Answer 1

1

You shouldn't use "a[href]", use "a" instead following this example:

Document doc = Jsoup.connect("http://jsoup.org").get();

Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"

So in your case:

Elements hyperLinks = htmlDoc.select("a");

    for(Element link: hyperLinks)
    {
        System.out.println(link.attr("abs:href"));
    }
Sign up to request clarification or add additional context in comments.

2 Comments

Wow thanks this is very useful to me! However i'm getting "/4ce2b699a9880" Could I turn this into absolute Url?
Ive managed to do it now using the source you gave me! thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.