PHP Curl: bring HTML with redirect

Question

I'm writing a crawler with PHP that reads the HTML and stores it in a variable. The code works great if the site doesn't have a redirect. If I crawl the Google, for example, I have the following:

CURL Result

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.com.br/?gfe_rd=cr&amp;ei=A14yVviJCuyp8wfmyIfIBg">here
</A>.
</BODY></HTML>

PHP method

private function parseHTML($url){
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_HTTPHEADER, array('X-Apple-Tz: 0', 'X-Apple-Store-Front: 143444,12'));
    ob_start();
    curl_exec($curl); 
    curl_close($curl);
    $html = ob_get_contents();
    ob_end_clean();
    return $html;
}

How can I redirect to the destination page, crawl the HTML and return the code?

When you get that 302 page content. Is the HTTP Status header also set to 302? — Mike Brant
– Mike Brant, Commented Oct 29, 2015 at 19:47

Ali · Accepted Answer · 2015-10-29 19:42:02Z

2

If the server would redirect your call, setting the CURLOPT_FOLLOWLOCATION option would do the trick, maybe in conjunction with CURLOPT_MAXREDIRS option to limit the number of redirects. see php's curl_setopt method

i.e.

curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);

However considering in the provided example, the server is not redirecting you (your curl's request) and instead gives you (the user) some information, I'm afraid your application has to read and digest the content and does the appropriate redirection accordingly.

answered Oct 29, 2015 at 19:42

Ali

3,0814 gold badges23 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mike Brant Over a year ago

There is nothing to say that there is not a 302 header sent along with that content when 302 occurs such that OP could use the curl options that you rightfully suggest. They would need to look at the response headers to see if they are truly getting a 302. It is very common for a web server to serve custom error content along with sending an appropriate response header. You especially see this for 404 responses.

Ali Over a year ago

You are right @MikeBrant , thanks for the input. In which case, we could also take advantage of CURLOPT_POSTREDIR option to identify if it is a 302

Collectives™ on Stack Overflow

PHP Curl: bring HTML with redirect

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related