How to emulate javascript generated download request with PHP/Curl

Question

I am trying to download with PHP/Curl a file from a public website for an open data project. How can I emulate the download request with PHP/Curl to obtain the file?

Can you please help me with this or least with how I should phrase the question?

The site uses javascripts to generate the download action. The download requests is done via a post-request (so no URL visible).

The site is : http://cri.nbb.be/bc9/web/catalog?lang=N&companyNr=0403233750 The file I try to download is the latest XBRL document related to the entity.

The header of the download request is the following:

Host: cri.nbb.be
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://cri.nbb.be/bc9/web/catalog?execution=e1s1
Cookie: be.nbb.selected.language=nl; JSESSIONID=00003DzVLI5-4k_otlBnJ3ylzKQ:-1; TS01f1bcac=011cb8a973def2718973d95f3988ed8392a49007ea289ef41640f86d275cfbbcc3df12bec9ffca6ced4717c1f1904a1785807d461dd198bf5951a9c35c905e55eeb738ad098adfe9ea3eef44ea3732108f528c6c5d; BIGipServerprd-bc9=270313664.46162.0000
Connection: keep-alive

I can obtain the source file that generates the download request (the htlm with the javascript) with the following code:

$filename = "0403233750.html";
$url = "http://cri.nbb.be/bc9/web/catalog?lang=N&companyNr=0403233750";
$ch = curl_init ($url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $output = curl_exec ($ch);
    if (preg_match('/expired/', $output)){
    return "stop";
    }
    if (preg_match('/problem/', $output)){
    return "stop";
    }
    if (!preg_match('/xml/', $output)){
    return "stop";
    }
file_put_contents($filename, $output);
curl_close ($ch);

But once I have the javascript, I don't know what I need to use to generate the download request in PHP/Curl.

What do you want to do? Use JavaScript to download the file? or Simulate the JavaScript code that downloads the file using PHP.cURL? — Nidhin David
– Nidhin David, Commented Dec 28, 2015 at 6:30
Thanks for your quick answer. I want to simulate the JavaScript using PHP/curl — user2586868
– user2586868, Commented Dec 28, 2015 at 6:58

Sabuj Hassan · Accepted Answer · 2015-12-28 07:40:43Z

1

While mimicking a request you can directly set those request headers with the option CURLOPT_HTTPHEADER. Though most of the cases the all the request headers are not important.

$ch = curl_init($url);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch,CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_COOKIEFILE, "/var/tmp/cookie.txt");  // use full path always
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_HTTPHEADER, array(
    'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language: en-US,en;q=0.5',
    'Accept-Encoding: gzip, deflate',
    'Referer: http://cri.nbb.be/bc9/web/catalog?execution=e1s1',
    'Cookie: be.nbb.selected.language=nl; JSESSIONID=...whatever u have...'

));
$output = curl_exec($ch);
curl_close ($ch);

Although there are specific curl option exists for different request headers. For example an user agent string can go with CURLOPT_USERAGENT, a referer header can go with CURLOPT_REFERER, and so on. More option usage can be seen from from this link: http://php.net/manual/en/function.curl-setopt.php

answered Dec 28, 2015 at 7:40

Sabuj Hassan

39.7k14 gold badges83 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2586868 Over a year ago

Thanks. I still need to understand how the header is generated by the JavaScript on the target website. I tried with firebug to go step by step through the request, but didn't understand what was happening. Is there a more appropriate/easy way to understand what the browser does in order to emulate it?

Sabuj Hassan Over a year ago

So far my understanding, Javascript runs on a browser and for most of the cases the browser sets the default value (if javascript doesn't set it). i.e. Accept, Accept-Language, Accept-Encoding, etc headers. The User-Agent header is fixed for your browser and the referer is set based on the current page you are running your javascript. And for cookie it uses whatever the server returns it to back. I believe you need to go through the basic of http request thing and you can find this at tutorialspoint.com/http/http_requests.htm

user2586868 Over a year ago

Thanks. So if I understand, the download request is probably triggered by a POST request. The header is therefore not sufficient, I also need the POST content.

Collectives™ on Stack Overflow

How to emulate javascript generated download request with PHP/Curl

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related