Scrape div contents using PHP and cURL

Question

I'm new to cURL. I have been trying to scrape contents of this amazon link, (ie., image, book title, author and price of the 20 books) into a html page. So far I've got is print the page using the below code

<?php
function curl($url) {
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_AUTOREFERER => TRUE,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT => 120,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_URL => $url,
    );

    $ch = curl_init();
    curl_setopt_array($ch, $options);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
?>

$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;

I have tried using regex and failed; I have tried everything possible for 6hrs straight and got really tired, hoping I will find solution here; just thanks isn't enough for the solution but tq in advance. :)

UPDATE: Found a really helpful site(click here) for beginners like me(without using cURL though).

Is there a reason you're not using the API for this? It'd be much easier. — Anthony Sterling
– Anthony Sterling, Commented Jul 25, 2013 at 12:33
use DOMDocument, XPath, phpquery, simple_html_dom for starters. please not regexp. — DevZer0
– DevZer0, Commented Jul 25, 2013 at 12:35
@DevZer0 regex is only my last option. I'm completely lost after trying for that long time. No offense, but a solution would help better my position. tq DevZer0 — John
– John, Commented Jul 25, 2013 at 12:38

Laurel · Accepted Answer · 2016-04-23 07:04:37Z

1

You really should be using the AWSECommerce API, but here's a way to leverage Yahoo's YQL service:

<?php
$query = sprintf(
    'http://query.yahooapis.com/v1/public/yql?q=%s',
    urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);

$xml = new SimpleXMLElement($query, null, true);

foreach ($xml->results->div as $product) {
    vprintf("%s\n", array(
        $product->div[1]->div[1]->a,
    ));
}

/*
    Engineering Thermodynamics
    A Textbook of Fluids Mechanics
    The Design of Everyday Things
    A Forest History of India
    Computer Networking
    The Story of Microsoft
    Private Empire: ExxonMobil and Americ...
    Project Management Metrics, KPIs, and...
    Design and Analysis of Experiments: I...
    IES - 2013: General English
    Foundation of Software Testing: ISTQB...
    Faster: 100 Ways to Improve your Digi...
    A Textbook of Fluid Mechanics and Hyd...
    Software Engineering for Embedded Sys...
    Communication Skills for Engineers
    Making Things Move DIY Mechanisms for...
    Virtual Instrumentation Using Labview
    Geometric Dimensioning and Tolerancin...
    Power System Protection & Switchgear...
    Computer Networks
*/

edited Apr 23, 2016 at 7:04

Laurel

6,23114 gold badges35 silver badges60 bronze badges

answered Jul 25, 2013 at 12:57

Anthony Sterling

2,44116 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

John Over a year ago

I can't thank you enough...all could do is mark the answer useful and +1...tq again

Collectives™ on Stack Overflow

Scrape div contents using PHP and cURL

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related