1

i want to pull data from amazon best deals url.and to show only the product part not the whole ie. header and sidebar and limit to 8 products. i am using curl and simple html dom in php

include_once("php/simple_html_dom.php");
//use curl to get html content
function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       return @curl_exec($ch);
}
echo $html=getHTML("http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals",10);

?>

but the problem is it pulls all the content but i want the product div part only
and amazon div container for product is

<div id="100_dealView_0" class="a-section a-spacing-none tallCellView gridColumn4 singleCell">

        <div class="a-section dealContainer">

    <div class="a-section backGround layer">
    </div>

    <div class="a-section layer">

            <div class="a-row dealContainer dealTile">


        <a id="dealImage" class="a-link-normal" href="https://www.amazon.in/s/ref=gbps_img_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1">
            <div class="a-row a-spacing-base a-spacing-top-base imageBlock">
                <div class="a-row dealContainer">
                    <div class="a-row layer">
                        <img alt="" src="https://images-na.ssl-images-amazon.com/images/I/51%2BpumuEs%2BL._AA210_.jpg" data-a-hires="https://images-na.ssl-images-amazon.com/images/I/51%2BpumuEs%2BL._AA420_.jpg">
                    </div>
                    <div class="a-row layer backGround">
                    </div>
                </div>
            </div>
        </a>



                    <div class="a-row a-spacing-mini">


        <span class="a-size-mini a-color-base dotdBadge">DEAL OF THE DAY</span>

</div>

                <div class="a-row a-spacing-mini">

            <div class="a-row priceBlock unitLineHeight">
                <span class="a-size-medium a-color-base inlineBlock unitLineHeight">₹549 - ₹5,399</span>
            </div>

</div>
                <div class="a-row a-spacing-mini">

        <div class="a-row unitLineHeight">
            <span class="a-size-mini a-color-secondary inlineBlock unitLineHeight">
                Ends in
            </span>

            <span id="100_dealView_0_dealClock" class="a-size-mini a-color-secondary inlineBlock unitLineHeight">12:13:59</span>
        </div>

</div>
                <div class="a-row a-spacing-mini">

    <a class="a-link-normal" href="https://www.amazon.in/s/ref=gbps_tit_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1">
        <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;TITLE&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}">

            <span id="dealTitle" class="a-size-base a-color-base dealTitleTwoLine hoverVisible visibleCss singleCellTitle autoHeight" style="width: 210px;">
                Men's Shoes: Minimum 40% Off for Sports Shoes
            </span>
            <span id="dealTitle" class="a-size-base a-color-link dealTitleTwoLine restVisible singleCellTitle autoHeight">
                Men's Shoes: Minimum 40% Off for Sports Shoes
            </span>

        </span>
    </a>

</div>

                    <div class="a-row a-spacing-mini">

        <div class="a-row reviewStars">
            <a class="a-link-normal touchAnchor" href="/gp/product-reviews/B00593XQS6/ref=gbps_rvw_s-4_0227_af8a024a?pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1">
                <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;REVIEWS&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}">

                            <i class="a-icon a-icon-star a-star-5"><span class="a-icon-alt">Avg. Customer Review</span></i>

                    1
            </span>
        </a>

</div>

                            <div class="a-row buttonOuterContainer ">


    <div class="a-row a-spacing-medium">

                        <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;SEE_MORE&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}">
                            <span class="a-button a-button-span12 a-button-primary fixedWidth210"><span class="a-button-inner"><a href="https://www.amazon.in/s/ref=gbps_ulm_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1" class="a-button-text a-text-center" role="button">
                                View Deal
                            </a></span></span>
                        </span>

    </div>


                            </div>
            </div>

    </div>
</div>

</div></div> 

their is 60+ divs but i want first 8 divs by scraping the content to the respective field.

7
  • You've included the Simple HTML DOM library. It has methods for parsing the HTML and searching for elements. Why aren't you using it? Commented Apr 23, 2016 at 7:12
  • You don't need to use curl. You can use file_get_html($url) from the Simple HTML DOM library. Commented Apr 23, 2016 at 7:13
  • thak you @Barmar but it shows undefined function and how to fetch the particular 8 divs Commented Apr 23, 2016 at 7:22
  • The documentation of Simple HTML DOM is here: simplehtmldom.sourceforge.net/manual.htm. Use the find() function to find the DIVs you want by giving an appropriate selector. Commented Apr 23, 2016 at 7:24
  • can u tell me how to fetch the exact 8 divs ,i also posted the amazon divs @Barmar Commented Apr 23, 2016 at 7:30

1 Answer 1

1

You can use XPath. Take a look at this tutorial on scraping the web in PHP. In your case, you haven't included the entire HTML here, but I'm guessing you want to capture the first div.

$document = new DOMDocument;

libxml_use_internal_errors(true);

$document->loadHTML($output);

$xpath = new DOMXPath($document);

$data = $xpath->query("//div[@id='100_dealView_0']");

foreach ($data as $d) { // in case there are multiple (there shouldn't be)
    echo $d->nodeValue;
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.