1

I found a website where I can look up vehicle inspections in Denmark. I need to extract some information from the page and loop through a series of license plates. Lets take this car as an example: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640

Here on the left table, you can see some basic information about the vehicle. On the right, you can see a list of the inspections for this specific car. I need a script, which can check if the car has any inspections and then grab the link to each of the inspection reports. Lets take the first inspection from the example. I would like to extract the onclick text from each of the inspections.

The first inspection link would be: location.href="/Sider/synsrapport.aspx?Inspection=18014439&Vin=VF7X1REVF72378327"

or if you could extract the inspection ID and Vin variable from the URL immediately:

Inspection ID: 18014439

Vin: VF7X1REVF72378327

Here is an example of a car which don't have any inspections yet, if you want to see what that looks like: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87400

Current Solution plan:

  1. Download the HTML source code as a String in VB.net

  2. Search the string and extract the specific parts.

  3. Store it in a StringBuilder and upload this to my SQL server

Is this the most efficient way, or do you know of any libraries which is used to specific extract elements from a website in VB.net! Thanks!

1 Answer 1

1

You could use Java libraries HtmlUnit or Jsoup to webscrape the page. Here's an example using HtmlUnit:

    LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
    java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

    WebClient client = new WebClient(BrowserVersion.CHROME);
    client.getOptions().setJavaScriptEnabled(true);
    client.getOptions().setThrowExceptionOnScriptError(false);
    client.getOptions().setThrowExceptionOnFailingStatusCode(false);
    
    HtmlPage page = client.getPage("http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640");
    HtmlTable inspectionsTable = (HtmlTable) page.getElementById("tblInspections");
    
    Map<String, String> inspections = new HashMap<String, String>();
    for (HtmlTableRow row: inspectionsTable.getRows()) {
        String[] splitRow = row.getAttribute("onclick").split("=");
        
        if (splitRow.length >= 4) {
            String id = splitRow[2].split("&")[0];
            String vin = splitRow[3].replace("\"", "");
            
            inspections.put(id, vin);
            System.out.println(id + " " + vin);
        }
    }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.