0

I'm currently using DOM Parser for my project. Also, I'm using CURL in php to scraping the website. I want to get a value from the script tag in the head of the HTML I get. But I really confused how to do that. If run the code bellow :

$data_dom = new simple_html_dom();
$data_dom->load($html);

foreach($data_dom->find('script') as $script){
    echo $script->plaintext."<br>";
}

The result was the empty value, when I inspect it, only br tag appear. I want to get everything that using script tag. Here is the head value :

<head>
I will give you the script I want to get
.....
<script type="text/javascript">
    var keysearch = {"departureLabel":"Surabaya (SUB : Juanda) Jawa Timur Indonesia","arrivalLabel":"Palangkaraya (PKY : Tjilik Riwut | Panarung) Kalimantan Tengah Indonesia","adultNum":"1","childNum":"0","infantNum":"0","departure":"SUB","arrival":"PKY","departDate":"20181115","roundTrip":0,"cabinType":-1,"departureCode":"ID-Surabaya-SUB","arrivalCode":"ID-Palangkaraya-PKY"};

    (function(window, _gtm, keysearch){

        if (window.gtmInstance){
            var departureExp = keysearch.departureCode.split("-");
            var arrivalExp = keysearch.arrivalCode.split("-");

            gtmInstance.setFlightData({
                'ITEM_TYPE': 'flight',
                'FLY_OUTB_CODE': departureExp[2],
                'FLY_OUTB_CITY': departureExp[1],
                'FLY_OUTB_COUNTRYCODE': departureExp[0],
                'FLY_OUTB_DATE': keysearch.departDate,

                'FLY_INB_CODE': arrivalExp[2],
                'FLY_INB_CITY': arrivalExp[1],
                'FLY_INB_COUNTRYCODE': arrivalExp[0],
                'FLY_INB_DATE': keysearch.returnDate,
                'FLY_NBPAX_ADL': keysearch.adultNum,
                'FLY_NBPAX_CHL': keysearch.childNum,
                'FLY_NBPAX_INF': keysearch.infantNum,
            });

            gtmInstance.pushFlightSearchEvent();
        }
    }(window, gtmInstance, keysearch));


                var key = "rkey=10fe7b6fd1f7fa1ef0f4fa538f917811dbc7f4628a791ba69962f2ed305fb72d061b67737afd843aaaeeee946f1442bb";
            var staticRoot = 'http://sta.nusatrip.net';

    $(function() {
        $("#currencySelector").nusaCurrencyOptions({
            selected: getCookie("curCode"),
        });                        
    });
</script>   
</head>

I want to get the key variable. I will use it to get the data from the website. Thanks

15
  • 1
    And what exactly does not work with the DOM approach? Commented Oct 8, 2018 at 7:34
  • I just don't know how to get the script tag value. It's like hidden Commented Oct 8, 2018 at 7:36
  • 2
    $data_dom->find('head') — If you want data in <script> then you should try to find a script rather than a head! Commented Oct 8, 2018 at 7:38
  • I have tried it, but it just giving the empty result Commented Oct 8, 2018 at 7:42
  • which dom parser are you using? Commented Oct 8, 2018 at 7:46

1 Answer 1

1

Depending on what the rest of the markup looks like, you may be able to just use DOMDocument and XPath, then parse out the value of the var with preg_match. This example will echo the key.

<?php

$html = <<<END
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <script type="text/javascript">
        var keysearch = {"departureLabel":"Surabaya (SUB : Juanda) Jawa Timur Indonesia","arrivalLabel":"Palangkaraya (PKY : Tjilik Riwut | Panarung) Kalimantan Tengah Indonesia","adultNum":"1","childNum":"0","infantNum":"0","departure":"SUB","arrival":"PKY","departDate":"20181115","roundTrip":0,"cabinType":-1,"departureCode":"ID-Surabaya-SUB","arrivalCode":"ID-Palangkaraya-PKY"};

        (function(window, _gtm, keysearch){

            if (window.gtmInstance){
                var departureExp = keysearch.departureCode.split("-");
                var arrivalExp = keysearch.arrivalCode.split("-");

                gtmInstance.setFlightData({
                    'ITEM_TYPE': 'flight',
                    'FLY_OUTB_CODE': departureExp[2],
                    'FLY_OUTB_CITY': departureExp[1],
                    'FLY_OUTB_COUNTRYCODE': departureExp[0],
                    'FLY_OUTB_DATE': keysearch.departDate,

                    'FLY_INB_CODE': arrivalExp[2],
                    'FLY_INB_CITY': arrivalExp[1],
                    'FLY_INB_COUNTRYCODE': arrivalExp[0],
                    'FLY_INB_DATE': keysearch.returnDate,
                    'FLY_NBPAX_ADL': keysearch.adultNum,
                    'FLY_NBPAX_CHL': keysearch.childNum,
                    'FLY_NBPAX_INF': keysearch.infantNum,
                });

                gtmInstance.pushFlightSearchEvent();
            }
        }(window, gtmInstance, keysearch));


                    var key = "rkey=10fe7b6fd1f7fa1ef0f4fa538f917811dbc7f4628a791ba69962f2ed305fb72d061b67737afd843aaaeeee946f1442bb";
                var staticRoot = 'http://sta.nusatrip.net';

        $(function() {
            $("#currencySelector").nusaCurrencyOptions({
                selected: getCookie("curCode"),
            });                        
        });
    </script>   
</head>
<body>foo</body>
</html>
END;


$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$result = $xpath->query('//script');

foreach($result as $currScriptTag)
{
    $currScriptContent = $currScriptTag->nodeValue;

    $matchFound = preg_match('/var key = "(.*)"/', $currScriptContent, $matches);

    if($matchFound)
    {
        /*
         * $matches[0] will contain the whole line like var key = "..." 
         * $matches[1] just contains the value of the var
         */
        $key = $matches[1];

        echo $key.PHP_EOL;
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

WOW, it's work. Thanks a lot, still don't know about XPath. Do u have any suggest where should I learn it? Thanks @Rob Ruchte
XPath is very mature and widely used, so there is a lot of material out there, but I use this often, it's one of the most concise references I've found: msdn.microsoft.com/en-us/library/ms256471(v=vs.110).aspx
Thanks again, you really help me

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.