0

I have to collect some data from a website.My data is wrapped as div s.Inside each div there is a title tag.I need to get the text inside these title tags.How to do this. I have written the following code.What modification I have to apply for acheiving the task

<?php
$str = '';
$page =  file_get_contents('http://www.sarkari-naukri.in/');
$dom = new DOMDocument();
$dom->loadHTML($page);
$divs = $dom->getElementsByTagName('div');
$i = 0;
$len = $divs->length;
while($i<$len) {
    $div = $divs->item($i++);
    $id = $div->getAttribute('id');
    if(strpos($id,'post-') !== false ) {
           // i need to get text inside title tag inside this div
        $title ='';//title should be stored here
        $str = $str.$title;
    }
}
echo $str;

SAMPLE HTML

<body>
    <div id = 'post-1'>
         <title>title 1</title>
    </div>
    <div id = 'post-2'>
         <title>title 2</title>
    </div>
    <div id = 'post-3'>
         <title>title 3</title>
    </div>
</body>
0

2 Answers 2

2

The following PHP DOMDOcument code:

$id = $div->getAttribute('id');
if (strpos($id,'post-') !== false) {

can be expressed in Xpath 1.0 with a Xpath string functionDocs:

//div[contains(@id, 'post-')]

Reading: Any div element which has an id attribute containing the string post-. By the rules of Xpath you can further extend the expression like selectinig the title children of all those:

//div[contains(@id, 'post-')]/title
Sign up to request clarification or add additional context in comments.

3 Comments

There is also starts-with maybe you meant that one?
but It is not being parsed since my HTML has errors and no DOM..what to do?
If the HTML is invalid (even that invalid that loadHTML can't cope with it), check with tidy: php.net/manual/en/book.tidy.php
1

You can use a xpath query to retrieve title information:

$xml = "<body>
    <div id = 'post-1'>
         <title>title 1</title>
    </div>
    <div id = 'post-2'>
         <title>title 2</title>
    </div>
    <div id = 'post-3'>
         <title>title 3</title>
    </div>
</body>";

$str = '';

$doc = new DOMDocument;
$doc->loadHTML($xml);

$xpath = new DOMXPath($doc);

$entries = $xpath->query('//body/div/title');
foreach ($entries as $entry) {
    $str .= $entry->nodeValue;
}

var_dump($str);

Live demo.

2 Comments

Thanks for the awesome answer...I need to select divs with someAttribute = someValue...How to do that?
@JinuJD: As well with xpath, please use the search. E.g. see XPath: How to select node with some attribute by index? - you should get comfortable with it after some time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.