I'm working on crawling information from a website: http://www.fatwallet.com
There are many redirected URLs. For instance: http://www.fatwallet.com/ticket/store/A4C?s=storepage
is redirected to http://www.a4c.com/?siteID=.7WaaTN6umc-s1Ih0x_Q67n6r7gInoh6Ug
I would like to use PHP to find out the redirected URL.
I have used "curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true)". I know it will automatically redirect 5 times.
However, the problem is, the page i get is not the final page, instead it's a page in between.
curl_exec returns:
HTTP/1.1 302 Moved Temporarily Server: Apache Location: www。fatwallet。com/interstitial/signin Vary: Accept-Encoding
Content-Encoding: gzip Content-Length: 20 Content-Type: text/html Date: Mon, 13 Apr 2015 12:03:19 GMT Connection: keep-alive
Set-Cookie: JSESSIONID=A9E28337052B56ADAC8451854A276210; Path=/; HttpOnlyHTTP/1.1 302 Moved Temporarily Server: Apache Location: www。fatwallet。com/interstitial/signin Vary: Accept-Encoding
Content-Encoding: gzip Content-Length: 20 Content-Type: text/html Date: Mon, 13 Apr 2015 12:03:19 GMT Connection: keep-aliveHTTP/1.1 200 OK Server: Apache Cache-Control: no-cache,no-store,max-age=0 Expires: Wed, 31 Dec 1969 23:59:59 GMT
X-UA-Compatible: IE=edge,chrome=1 Vary: User-Agent,Accept-Encoding
Content-Language: en Content-Encoding: gzip Content-Type: text/html;charset=UTF-8 Content-Length: 16949 Date: Mon, 13 Apr 2015 12:03:20 GMT Connection: keep-alive Set-Cookie: list_styles=grid; Expires=Sat, 01-May-2083 15:17:27 GMT; Path=/
Set-Cookie: non_mem=f86c0692-826f-40f2-9fa1-1e2f9a957af8; Expires=Sat, 01-May-2083 15:17:27 GMT; Path=/ ............
It seems that the third redirected code is "HTTP/1.1 200 OK", but it is not the final page. If you check http://www.fatwallet.com/ticket/store/A4C?s=storepage you will understand what I mean. Also, there is no way to find the final URL in the page returned.
So my question is, could it be able to make curl continue redirecting even if it receives HTTP/1.1 200 OK?
Is there another way to solve this(by using snoopy or python)?
Thanks for all!