0

I'm new to PHP and am trying to scrape data from a website I'm using regular expressions, but locating content rental and details in the div is a problem here is my code. Could someone help me out?

    <?php
header('content-type: text/plain');
$contents= file_get_contents('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=22&Itemid=29');
$contents = preg_replace('/\s(1,)/','',$contents);
$contents = preg_replace('/&nbsp;/','',$contents);

//print $contents."\n";
$records = preg_split('/<span class="style8"/',$contents);

for ($ix=1; $ix < count($records); $ix++){
$tmp = $records[$ix];

preg_match('/href="(.*?)"/',$tmp, $match_url);
preg_match('/>(.*?)<\/span>/',$tmp,$match_name);
preg_match('/<div[^>]+class ?= ?"style10"[^>]*>(\s*(<div.*(?2).*<\/div>\s*)*)<\/div>/Us',$tmp,$match_rental);//error is here 
print_r($match_url);
print_r($match_name);
print_r($match_rental);
print $tmp."\n";
exit ();
}
//print count($records)."\n";
//print_r($records);
//if ($contents===false)
//print 'FALSE';
//print_r(htmlentities($contents));

?> 

Here is a sample of the content

    >HILLVIEW CROSSROADS4 BED HOUSE</span></div></td>
                </tr>
                <tr>
                  <td width="57%" style="padding-left:20px;"><div align="left" class="style10" style="color:#007AC7;">
                      <div align="left">
                                            Rental; 
                        USD                     4,500 
                        </div>
                  </div></td>
                  <td width="43%" align="right"><div align="right" class="style10" style="color:#007AC7;">
                      <div align="right">

                      No.             
                      834 

                       </div>
                  </div></td>
                </tr>
                <tr>
                  <td colspan="2" style="padding-left:20px;color:#000000;">
                  <div align="justify" style="font-family:Arial, Helvetica, sans-serif;font-size:11px;color:333300;">Artistically designed 4 bed (all
ensuite) house on half acre of well-tended gardens. Lounge with fireplace opening to terrace, opulent master suite, family room, study. Good finishes, SQ, carport, extra water storage
and generator.                                <a href="/index.php?option=com_content&amp;view=article&amp;id=27&amp;Itemid=74&amp;send=5&amp;ref_no=834/II&amp;t=2">....Details</a>               </div></td>
                </tr>
            </table></td>
          </tr>
</table>
<br>
2
  • Why are you using regular expressions to parse HTML? There are multiple HTML parsers available for PHP, which will handle all kinds of things that regular expressions won't. An HTML parser knows which constructs are valid in which versions of HTML and XHTML, for instance, and uses the doctype to determine which version the page is using. Commented Mar 24, 2012 at 19:58
  • Please send me links to a tutorial would highly appreciate i'm kinda new Commented Mar 25, 2012 at 5:15

1 Answer 1

2

That website doesn't have good css selectors but it's still not to hard to get it with xpath:

$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=22&Itemid=29');
$xpath = new DOMXPath($dom);

foreach($xpath->query("//div[@id='ad']/table") as $table) {
  // title
  echo $xpath->query(".//span[@class='style8']", $table)->item(0)->nodeValue . "\n";
  // price
  echo $xpath->query(".//div[@class='style10']/div", $table)->item(0)->nodeValue . "\n";
  // description
  echo $xpath->query(".//div[@align='justify']", $table)->item(0)->nodeValue . "\n";
}
Sign up to request clarification or add additional context in comments.

3 Comments

Do you know how i can traverse to the next page or the details because under more details i need the images and from the map latitude and longitude does Xpath support this? thanks!
I recommend reading some xpath tutorials and trying it yourself. If you get stuck you can post a new question with an xpath tag and you are likely to get a good answer.
okay thanks one last question how do i get the title added echo $xpath->query("./span[@class='style8']", $table)->item(0)->nodeValue; below foreach and returns an error trying to get the name.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.