0

I'm new to cURL. I have been trying to scrape contents of this amazon link, (ie., image, book title, author and price of the 20 books) into a html page. So far I've got is print the page using the below code

<?php
function curl($url) {
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_AUTOREFERER => TRUE,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT => 120,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_URL => $url,
    );

    $ch = curl_init();
    curl_setopt_array($ch, $options);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
?>

$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;

I have tried using regex and failed; I have tried everything possible for 6hrs straight and got really tired, hoping I will find solution here; just thanks isn't enough for the solution but tq in advance. :)

UPDATE: Found a really helpful site(click here) for beginners like me(without using cURL though).

8
  • Is there a reason you're not using the API for this? It'd be much easier. Commented Jul 25, 2013 at 12:33
  • 1
    use DOMDocument, XPath, phpquery, simple_html_dom for starters. please not regexp. Commented Jul 25, 2013 at 12:35
  • docs.aws.amazon.com/AWSECommerceService/latest/DG/… :) Commented Jul 25, 2013 at 12:37
  • @DevZer0 regex is only my last option. I'm completely lost after trying for that long time. No offense, but a solution would help better my position. tq DevZer0 Commented Jul 25, 2013 at 12:38
  • 1
    This might be of some interest too. y.ahoo.it/NzKmy Commented Jul 25, 2013 at 12:39

1 Answer 1

1

You really should be using the AWSECommerce API, but here's a way to leverage Yahoo's YQL service:

<?php
$query = sprintf(
    'http://query.yahooapis.com/v1/public/yql?q=%s',
    urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);

$xml = new SimpleXMLElement($query, null, true);

foreach ($xml->results->div as $product) {
    vprintf("%s\n", array(
        $product->div[1]->div[1]->a,
    ));
}

/*
    Engineering Thermodynamics
    A Textbook of Fluids Mechanics
    The Design of Everyday Things
    A Forest History of India
    Computer Networking
    The Story of Microsoft
    Private Empire: ExxonMobil and Americ...
    Project Management Metrics, KPIs, and...
    Design and Analysis of Experiments: I...
    IES - 2013: General English
    Foundation of Software Testing: ISTQB...
    Faster: 100 Ways to Improve your Digi...
    A Textbook of Fluid Mechanics and Hyd...
    Software Engineering for Embedded Sys...
    Communication Skills for Engineers
    Making Things Move DIY Mechanisms for...
    Virtual Instrumentation Using Labview
    Geometric Dimensioning and Tolerancin...
    Power System Protection & Switchgear...
    Computer Networks
*/
Sign up to request clarification or add additional context in comments.

1 Comment

I can't thank you enough...all could do is mark the answer useful and +1...tq again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.