1

I am trying to download with PHP/Curl a file from a public website for an open data project. How can I emulate the download request with PHP/Curl to obtain the file?

Can you please help me with this or least with how I should phrase the question?

The site uses javascripts to generate the download action. The download requests is done via a post-request (so no URL visible).

The site is : http://cri.nbb.be/bc9/web/catalog?lang=N&companyNr=0403233750 The file I try to download is the latest XBRL document related to the entity.

The header of the download request is the following:

Host: cri.nbb.be
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://cri.nbb.be/bc9/web/catalog?execution=e1s1
Cookie: be.nbb.selected.language=nl; JSESSIONID=00003DzVLI5-4k_otlBnJ3ylzKQ:-1; TS01f1bcac=011cb8a973def2718973d95f3988ed8392a49007ea289ef41640f86d275cfbbcc3df12bec9ffca6ced4717c1f1904a1785807d461dd198bf5951a9c35c905e55eeb738ad098adfe9ea3eef44ea3732108f528c6c5d; BIGipServerprd-bc9=270313664.46162.0000
Connection: keep-alive

I can obtain the source file that generates the download request (the htlm with the javascript) with the following code:

$filename = "0403233750.html";
$url = "http://cri.nbb.be/bc9/web/catalog?lang=N&companyNr=0403233750";
$ch = curl_init ($url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $output = curl_exec ($ch);
    if (preg_match('/expired/', $output)){
    return "stop";
    }
    if (preg_match('/problem/', $output)){
    return "stop";
    }
    if (!preg_match('/xml/', $output)){
    return "stop";
    }
file_put_contents($filename, $output);
curl_close ($ch); 

But once I have the javascript, I don't know what I need to use to generate the download request in PHP/Curl.

2
  • What do you want to do? Use JavaScript to download the file? or Simulate the JavaScript code that downloads the file using PHP.cURL? Commented Dec 28, 2015 at 6:30
  • Thanks for your quick answer. I want to simulate the JavaScript using PHP/curl Commented Dec 28, 2015 at 6:58

1 Answer 1

1

While mimicking a request you can directly set those request headers with the option CURLOPT_HTTPHEADER. Though most of the cases the all the request headers are not important.

$ch = curl_init($url);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch,CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_COOKIEFILE, "/var/tmp/cookie.txt");  // use full path always
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_HTTPHEADER, array(
    'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language: en-US,en;q=0.5',
    'Accept-Encoding: gzip, deflate',
    'Referer: http://cri.nbb.be/bc9/web/catalog?execution=e1s1',
    'Cookie: be.nbb.selected.language=nl; JSESSIONID=...whatever u have...'

));
$output = curl_exec($ch);
curl_close ($ch);

Although there are specific curl option exists for different request headers. For example an user agent string can go with CURLOPT_USERAGENT, a referer header can go with CURLOPT_REFERER, and so on. More option usage can be seen from from this link: http://php.net/manual/en/function.curl-setopt.php

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. I still need to understand how the header is generated by the JavaScript on the target website. I tried with firebug to go step by step through the request, but didn't understand what was happening. Is there a more appropriate/easy way to understand what the browser does in order to emulate it?
So far my understanding, Javascript runs on a browser and for most of the cases the browser sets the default value (if javascript doesn't set it). i.e. Accept, Accept-Language, Accept-Encoding, etc headers. The User-Agent header is fixed for your browser and the referer is set based on the current page you are running your javascript. And for cookie it uses whatever the server returns it to back. I believe you need to go through the basic of http request thing and you can find this at tutorialspoint.com/http/http_requests.htm
Thanks. So if I understand, the download request is probably triggered by a POST request. The header is therefore not sufficient, I also need the POST content.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.