Scraping and divs

Question

I'm new to PHP and am trying to scrape data from a website I'm using regular expressions, but locating content rental and details in the div is a problem here is my code. Could someone help me out?

    <?php
header('content-type: text/plain');
$contents= file_get_contents('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=22&Itemid=29');
$contents = preg_replace('/\s(1,)/','',$contents);
$contents = preg_replace('/&nbsp;/','',$contents);

//print $contents."\n";
$records = preg_split('/<span class="style8"/',$contents);

for ($ix=1; $ix < count($records); $ix++){
$tmp = $records[$ix];

preg_match('/href="(.*?)"/',$tmp, $match_url);
preg_match('/>(.*?)<\/span>/',$tmp,$match_name);
preg_match('/<div[^>]+class ?= ?"style10"[^>]*>(\s*(<div.*(?2).*<\/div>\s*)*)<\/div>/Us',$tmp,$match_rental);//error is here 
print_r($match_url);
print_r($match_name);
print_r($match_rental);
print $tmp."\n";
exit ();
}
//print count($records)."\n";
//print_r($records);
//if ($contents===false)
//print 'FALSE';
//print_r(htmlentities($contents));

?>

Here is a sample of the content

    >HILLVIEW CROSSROADS4 BED HOUSE</span></div></td>
                </tr>
                <tr>
                  <td width="57%" style="padding-left:20px;"><div align="left" class="style10" style="color:#007AC7;">
                      <div align="left">
                                            Rental; 
                        USD                     4,500 
                        </div>
                  </div></td>
                  <td width="43%" align="right"><div align="right" class="style10" style="color:#007AC7;">
                      <div align="right">

                      No.             
                      834 

                       </div>
                  </div></td>
                </tr>
                <tr>
                  <td colspan="2" style="padding-left:20px;color:#000000;">
                  <div align="justify" style="font-family:Arial, Helvetica, sans-serif;font-size:11px;color:333300;">Artistically designed 4 bed (all
ensuite) house on half acre of well-tended gardens. Lounge with fireplace opening to terrace, opulent master suite, family room, study. Good finishes, SQ, carport, extra water storage
and generator.                                <a href="/index.php?option=com_content&amp;view=article&amp;id=27&amp;Itemid=74&amp;send=5&amp;ref_no=834/II&amp;t=2">....Details</a>               </div></td>
                </tr>
            </table></td>
          </tr>
</table>
<br>

Why are you using regular expressions to parse HTML? There are multiple HTML parsers available for PHP, which will handle all kinds of things that regular expressions won't. An HTML parser knows which constructs are valid in which versions of HTML and XHTML, for instance, and uses the doctype to determine which version the page is using. — Adam Mihalcin
– Adam Mihalcin, Commented Mar 24, 2012 at 19:58
Please send me links to a tutorial would highly appreciate i'm kinda new — user1207576
– user1207576, Commented Mar 25, 2012 at 5:15

pguardiario · Accepted Answer · 2012-03-25 07:11:13Z

2

That website doesn't have good css selectors but it's still not to hard to get it with xpath:

$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=22&Itemid=29');
$xpath = new DOMXPath($dom);

foreach($xpath->query("//div[@id='ad']/table") as $table) {
  // title
  echo $xpath->query(".//span[@class='style8']", $table)->item(0)->nodeValue . "\n";
  // price
  echo $xpath->query(".//div[@class='style10']/div", $table)->item(0)->nodeValue . "\n";
  // description
  echo $xpath->query(".//div[@align='justify']", $table)->item(0)->nodeValue . "\n";
}

edited Mar 25, 2012 at 7:11

answered Mar 25, 2012 at 2:37

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1207576 Over a year ago

Do you know how i can traverse to the next page or the details because under more details i need the images and from the map latitude and longitude does Xpath support this? thanks!

pguardiario Over a year ago

I recommend reading some xpath tutorials and trying it yourself. If you get stuck you can post a new question with an xpath tag and you are likely to get a good answer.

user1207576 Over a year ago

okay thanks one last question how do i get the title added echo $xpath->query("./span[@class='style8']", $table)->item(0)->nodeValue; below foreach and returns an error trying to get the name.

Collectives™ on Stack Overflow

Scraping and divs

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related