I've got a PHP function that checks a URL to make sure that (a.) there's some kind of server response, and (b.) it's not a 404.
It works just fine on every domain/URL I've tested, with the exception of bostonglobe.com, where it's returning a 404 for valid URLs. I'm guessing it has something to do with their paywall, but my function works fine on nytimes.com and other newspaper sites.
Here's an example URL that returns a 404:
What am I doing wrong?
function check_url($url){
$userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
$result = curl_exec($curl);
if ($result == false) {
//There was no response
$message = "No information found for that URL";
} else {
//What was the response?
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 404) {
$message = "No information found for that URL";
} else{
$message = "Good";
}
}
return $message;
}