PHP file_get_contents/curl - getting different result than browser

Question

I'm trying to get content of this page: http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=0

I tried file_get_contents and curl solution but all gives me a Login page of NYTimes and I have no idea why.

Tried these file_get_contents()/curl getting unexpected page, PHP file_get_contents() behaves differently to browser, file_get_content get the wrong web

Is there any solution? Thanks

EDIT:

    //this is the curl code I use
    $cookieJar = dirname(__FILE__) . '/cookie.txt';
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieJar);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieJar);
    curl_setopt($ch, CURLOPT_URL, $link);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026     Firefox/3.6.12');
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data    = curl_exec($ch);
    curl_close($ch);

On the server you are running this code on, does "curl nytimes.com/2014/01/26/us/politics/…" output the right information? — brian
– brian, Commented Jan 29, 2014 at 19:15
They could be blocking access by domain (to prevent scraping) in their server settings such as .htaccess — dev7
– dev7, Commented Jan 29, 2014 at 19:15
nytimes is definitely blocking scrapers. You'll have to tinker with the cURL flags to get it to appear as if it's a browser. I'm not a cURL pro; I wish I could help more. Best of luck :) — brian
– brian, Commented Jan 29, 2014 at 19:21
@enigma I passed curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12'); — simPod
– simPod, Commented Jan 29, 2014 at 19:23

Abdalla Mohamed Aly Ibrahim · Accepted Answer · 2014-01-29 20:31:52Z

3

try to test it using saving cookies to same directory where the script resides first
so set the cookies path like that
$cookie = "cookie.txt";
this code works with me and i got the page

<?php
function curl_get_contents($url)
{
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
  curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}
$get_page = curl_get_contents("http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=1");
echo $get_page;
   ?>

answered Jan 29, 2014 at 20:31

Abdalla Mohamed Aly Ibrahim

3,9931 gold badge29 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

simPod Over a year ago

Thanks, this works! The guy below you was faster, though your answer is better - more complex and I got it working just because of it.

Abdalla Mohamed Aly Ibrahim Over a year ago

glad to hear that, it's not complex i use a function to it reusable,

simPod Over a year ago

yes but there's full curl settings so I could copy paste it to see that it works. In my old curl settings I had CONNECTTIMEOUT set which made it malfunctioning.

Abdalla Mohamed Aly Ibrahim Over a year ago

you can add any more setting for this function as you want

simPod Over a year ago

I know, it just that timeout that prevented it to work. I didn't realize that it was that before you posted your answer.

dljve · Accepted Answer · 2014-01-29 19:29:45Z

1

I think you need cURL to allow cookies to be saved. Try adding these lines to the cURL setup. For me this worked:

$cookie = dirname(__FILE__) . "\cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);

answered Jan 29, 2014 at 19:29

dljve

5453 silver badges12 bronze badges

1 Comment

simPod Over a year ago

ok my bad, had CURLOPT_CONNECTTIMEOUT set... It works. Thanks.

ishenkoyv · Accepted Answer · 2014-01-29 20:49:08Z

0

Use Live HTTP Headers firefox plugin to check what is going on during page access. There can be redirections, cookie set etc. And then try to implement this behaviour with php curl (note: set user-agent as and other client headers the same as browser)

answered Jan 29, 2014 at 20:49

ishenkoyv

6734 silver badges9 bronze badges

Collectives™ on Stack Overflow

PHP file_get_contents/curl - getting different result than browser

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related