0

I am trying to scrape Google search results using PHP.

I tried using @file_get_contents(http://www.google.com/search?hl=en&q=test) but it does not work. It only works with http://www.google.com.

I tried using curl instead. Here's my function:

function my_fetch($url,$user_agent='Mozilla/4.0 (compatible; MSIE
5.01; Windows NT 5.0)')  { 
    $ch = curl_init(); 
    curl_setopt ($ch, CURLOPT_URL, $url); 
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); 
    curl_setopt ($ch, CURLOPT_HEADER, 0); 
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/'); 
    $result = curl_exec ($ch); 
    curl_close ($ch); 
    return $result;  }

$googleContent = my_fetch("http://www.google.com/search?hl=en&q=test");
echo $googleContent;

The result is

302 Moved
The document has moved here.

With a link to here: http://www.google.com/sorry/?continue=http://www.google.com/search%3Fhl%3Den%26q%3Dtest

Is there any way to crawl the search results using PHP without having to learn the API?

6
  • 4
    I think learning the API is a lot more viable. Commented Oct 24, 2011 at 4:04
  • 6
    Scraping Google search results is against their TOS. Use the Custom Search API instead. Commented Oct 24, 2011 at 4:06
  • 2
    The current search API is covered at code.google.com/apis/customsearch/v1/reference.html and needs an API key. As search companies ultimately want to make money, they don't make search result pages easy to scrape. Commented Oct 24, 2011 at 4:08
  • Now it show the correct result, try it.... Commented Oct 24, 2011 at 5:35
  • 1
    Using the Search API is not useful for getting accurate rankings and the amount of data is heavily restricted, even the quite expensive commercial sort is kinda useless for a larger amount of data. Regarding the TOS, you do not accept the TOS by accessing Google and you can reject it in a written statement if you accepted it before (like when using a Google account) not that it plays a role, if you do not cause trouble Google will not hunt you for scraping them. There is a opensource PHP project at scraping.compunect.com which scrapes Google reliable. I guess my answer comes too late:) Commented May 9, 2014 at 2:00

2 Answers 2

6

Your problem is Google does a redirect. You need to add

CURLOPT_FOLLOWLOCATION => true
Sign up to request clarification or add additional context in comments.

Comments

0

What are you trying to scrape? There are numerous ways of getting SERPS without breaking Google's TOS.

I've used RSS feeds from Search engines in the past - I think you can add a date filter so you don't end up with the same results reach time.

1 Comment

I am trying to scrape the ratings. I don't know any way of getting the rating data using the API or RSS feeds.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.