0

I am new with Jsoup and I am trying to get the URL from onclick attribute which calls a function called ga and which has five parameters , so it looks like this ga('send', 'event', 'tunein', 'playjp', 'http://link that i want to get'); , I want to grab the http url.

I tried with attr("onclick") option but it doesn't function at all, do you know if there is a chance to get this somehow.

1 Answer 1

1

Are you sure you are on the right node ?

node.attr("onclick") should work

Can you post the link of the page you are trying to scrape , and how you reach the node ?

public void jsoupParse() throws IOException {
        Document doc = Jsoup.connect("https://www.internet-radio.com/station/dougeasyhits/").get();
        Element image = doc.select("div.jp-controls").select("i").get(0); //get the first image (play button)
        String onclick = image.attr("onclick");
        System.out.print(onclick);

    }

output :

ga('send', 'event', 'tunein', 'playjp', 'http://airspectrum.cdnstream1.com:8114/1648_128.m3u');

now all you need to do is manipulate the string with 'split' method to extract the url :

Document doc = Jsoup.connect("https://www.internet-radio.com/station/dougeasyhits/").get();
    Element image = doc.select("div.jp-controls").select("i").get(0); //get the first image (play button)
    String onclick = image.attr("onclick");
    String[] parts = onclick.split("'"); //i split the string in an array of strings using [ ' ] as separator
    String url = parts[9]; //the url is contained in the 10th element of the array
    System.out.println(onclick);
    System.out.print(url);

output

    ga('send', 'event', 'tunein', 'playjp', 'http://airspectrum.cdnstream1.com:8114/1648_128.m3u');
http://airspectrum.cdnstream1.com:8114/1648_128.m3u

this is how the "onclick" attribute got split in case you are confused :

parts[0] : "ga("
parts[1] : "send"
parts[2] : ", "
parts[3] : "event"
parts[4] : ", "
parts[5] : "tunein"
parts[6] : ", "
parts[7] : "playjp"
parts[8] : ", "
parts[9] : "http://airspectrum.cdnstream1.com:8114/1648_128.m3u"
parts[10] : ");"
Sign up to request clarification or add additional context in comments.

3 Comments

here you have https://www.internet-radio.com/station/dougeasyhits/ inspect the play button
Instead of .select("div.jp-controls").select("i") use .select("div.jp-controls > i"). Also use .selectFirst(...) instead of .select(...).get(0). Looks nicer.
Thank you for your response, I have managed to get this output, but my problem is how to use split method to extract url, thanks a lot for your response

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.