PHP file_get_contents error, wouldn't populate from an array?

Question

I've been trying to write a simple script in PHP to pull off data from a ISBN database site. and for some reason I've had nothing but issues using the file_get_contents command.. I've managed to get something working for this now, but would just like to see if anyone knows why this wasn't working?

The below would not populate the $page with any information so the preg matches below failed to get any information. If anyone knows what the hell was stopping this would be great?

$links = array ('
    http://www.isbndb.com/book/2009_cfa_exam_level_2_schweser_practice_exams_volume_2','
    http://www.isbndb.com/book/uniform_investment_adviser_law_exam_series_65','
    http://www.isbndb.com/book/waterworks_a02','
    http://www.isbndb.com/book/winning_the_toughest_customer_the_essential_guide_to_selling','
    http://www.isbndb.com/book/yale_daily_news_guide_to_fellowships_and_grants'

    ); // array of URLs

foreach ($links as $link)
{

    $page = file_get_contents($link);
    #print $page;

                preg_match("@<h1 itemprop='name'>(.*?)</h1>@is",$page,$title);
                preg_match("@<a itemprop='publisher' href='http://isbndb.com/publisher/(.*?)'>(.*?)</a>@is",$page,$publisher);
                preg_match("@<span>ISBN10: <span itemprop='isbn'>(.*?)</span>@is",$page,$isbn10);
                preg_match("@<span>ISBN13: <span itemprop='isbn'>(.*?)</span>@is",$page,$isbn13);
                        echo '<tr>
                        <td>'.$title[1].'</td>
                        <td>'.$publisher[2].'</td>
                        <td>'.$isbn10[1].'</td>
                        <td>'.$isbn13[1].'</td>
                        </tr>'; 
                        #exit();                                    

            }

There's a newline before each of your URLs, could that be causing the issue? — Sean
– Sean, Commented Sep 12, 2014 at 13:48
Never parse html with regex stackoverflow.com/questions/1732348/… — Bogdan Burym
– Bogdan Burym, Commented Sep 12, 2014 at 13:50

Kleskowy · Accepted Answer · 2014-09-12 13:57:02Z

My guess is you have wrong (not direct) URLs. Proper ones should be without the www. part - if you fire any of them and inspect the returned headers, you'll see that you're redirected (HTTP 301) to another URL.

The best way to do it in my opinion is to use cURL among curl_setopt with options CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS.

Of course you should trim your urls beforehands just to be sure it's not the problem.

Example here:

$curl = curl_init();
foreach ($links as $link) {

   curl_setopt($curl, CURLOPT_URL, $link);
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
   curl_setopt($curl, CURLOPT_MAXREDIRS, 5); // max 5 redirects

   $result = curl_exec($curl);
   if (! $result) {
      continue; // if $result is empty or false - ignore and continue;
   }

   // do what you need to do here
}
curl_close($curl);

Collectives™ on Stack Overflow

PHP file_get_contents error, wouldn't populate from an array?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related