I'm trying to scrape this link: https://www.bu.edu/link/bin/uiscgi_studentlink/1293403322?College=SMG&Dept=AC&Course=222&Section=C1&Subject=ACCT &MtgDay=&MtgTime=&ModuleName=univschr.pl&KeySem=20114&ViewSem=Spring+2011&SearchOptionCd=C&SearchOptionDesc=Class+Subject&MainCampusInd=. (It works fine if you access it in the browser.)
So I cUrl it, using this code:
function curl_classes($url){
$ch = curl_init();
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
curl_setopt($ch,CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
echo "NOW IM REALY GOING TO: " . $url;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$html = curl_exec($ch);
curl_close($ch);
unset($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
echo htmlspecialchar($html);
}
EDIT
Okay, new problem. My cookie storing code doesn't seem to be working. I'm able to scrape this like as desired: bu[DOT]edu/link/bin/uiscgi_studentlink/1293357973?ModuleName=univschr.pl&SearchOptionDesc=Class+Subject&SearchOptionCd=C&KeySem=20114&ViewSem=Spring+2011&Subject=ACCT&MtgDay=&MtgTime=
But when I try to scrape the link at the top of this post I get: "Sorry you need cookies enabled..."
What am I doing wrong in my cookie storing code?