0

I am working on a project where I have to web-scrape from the site https://lite.ip2location.com. When you come into the site there are a number of divs each with a different country. When you click on one of them the browser is redirected to a table on that site. The table has a thead and tbody. I need to access the tbody but for some reason, I only get the information from the thead tag.

This is my code:

    public static void main(String[] args) {
        final String url = "https://lite.ip2location.com/ip-address-ranges-by-country";
        try {
            final Document document = Jsoup.connect(url).get();
            for (Element element : document.select("div.card-columns div")) {
                Elements link = element.select("a");
                String redirectUrl = "https://lite.ip2location.com" + link.attr("href");
                final Document redirectDoc = Jsoup.connect(redirectUrl).get();
                Element table = redirectDoc.select("table").get(0);
                for (Element row : table.select("tbody tr")) {
                    System.out.println(row.text());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }   
2
  • Why do you want to scrape the pages? You can sign up for the free DB1 LITE database which consists of all the information in one CSV file. Commented May 8, 2022 at 23:53
  • @MichaelC. its for a school project we have to web-scrape using jsoup. Commented May 9, 2022 at 6:40

1 Answer 1

1

Jsoup just parses HTML from a URL it does not execute JavaScript or fetches additional resources such as js or css files.

At a page with IP address ranges for a country data represented in JSON that is loaded asynchronously by a browser and then with that data a table is populated.

Here final Document redirectDoc = Jsoup.connect(redirectUrl).get(); you got an HTML page that contains only a template for a table. Like this

<div class="row my-5" style="min-height:500px;">
    <div class="col table-responsive">
        <table id="ip-address" class="table table-striped table-hover">
            <thead>
                <tr>
                    <th width="30%" class="no-sort">Begin IP Address</th>
                    <th width="30%" class="no-sort">End IP Address</th>
                    <th width="40%" class="text-right no-sort">Total Count</th>
                </tr>
            </thead>
            <tbody>
            </tbody>
        </table>
    </div>
</div>

And exactly this fragment you parses in your code.

So, there is a one of a possible solution to get ranges.

The data with IP address ranges for Zimbabwe locates at URLs like this https://cdn-lite.ip2location.com/datasets/ZW.json . A file name matches with Country Codes Alpha-2 (ZW for Zimbabwe).

These codes for available countries can be extracted from https://lite.ip2location.com/ip-address-ranges-by-country page where inside a <p class="card-text"> tag for each country there is a span tag to draw a flag.

The second class contains a code at the end of the name (flag-icon-ba)

<div class="card" style="min-height:72px;">
    <div class="card-body" style="padding:.85rem;">
        <p class="card-text"><span class="flag-icon flag-icon-ba"></span> <a href="/bosnia-and-herzegovina-ip-address-ranges">Bosnia and Herzegovina</a></p>
    </div>
</div>

BA for Bosnia and Herzegovina.

Having a URL to a JSON data for a particular country, you can fetch it with Jsoup.

String data = Jsoup
        .connect("https://cdn-lite.ip2location.com/datasets/BA.json")
        .ignoreContentType(true)
        .get().text();
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.