Parsing XML in PHP DOM via cURL - can't get nodeValue if it is url address or date

Question

I have this strange problem parsing XML document in PHP loaded via cURL. I cannot get nodeValue containing URL address (I'm trying to implement simple RSS reader into my CMS). Strange thing is that it works for every node except that containing url addresses and date ( and ).

Here is the code (I know it is a stupid solution, but I'm kinda newbie in working with DOM and parsing XML documents).

function file_get_contents_curl($url) {

$ch = curl_init();    // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
$result = curl_exec($ch); // run the whole process

return $result;
}

function vypis($adresa) {

$html = file_get_contents_curl($adresa);

$doc = new DOMDocument();
@$doc->loadHTML($html);

$nodes = $doc->getElementsByTagName('title');
$desc = $doc->getElementsByTagName('description');
$ctg = $doc->getElementsByTagName('category');
$pd = $doc->getElementsByTagName('pubDate');
$ab = $doc->getElementsByTagName('link');
$aut = $doc->getElementsByTagName('author');


for ($i = 1; $i < $desc->length; $i++) {

    $dsc = $desc->item($i);
    $titles = $nodes->item($i);
    $categorys = $ctg->item($i);
    $pubDates = $pd->item($i);
    $links = $ab->item($i);
    $autors = $aut->item($i);

    $description = $dsc->nodeValue;
    $title = $titles->nodeValue;
    $category = $categorys->nodeValue;
    $pubDate = $pubDates->nodeValue;
    $link = $links->nodeValue;
    $autor = $autors->nodeValue;

    echo 'Title:' . $title . '<br/>';
    echo 'Description:' . $description . '<br/>';
    echo 'Category:' . $category . '<br/>';
    echo 'Datum ' . gmdate("D, d M Y H:i:s",
       strtotime($pubDate)) . " GMT" . '<br/>';
    echo "Autor: $autor" . '<br/>';
    echo 'Link: ' . $link . '<br/><br/>';
}
}

Can you please help me with this?

Could you give the URL of the XML file you're trying to read from? — Ja͢ck
– Ja͢ck, Commented May 12, 2012 at 4:44

Ja͢ck · Accepted Answer · 2012-05-12 05:04:01Z

2

To read RSS you shouldn't use loadHTML, but loadXML. One reason why your links don't show is because the <link> tag in HTML ignores its contents. See also here: http://www.w3.org/TR/html401/struct/links.html#h-12.3

Also, I find it easier to just iterate over the <item> tags and then iterate over their children nodes. Like so:

$d = new DOMDocument;
// don't show xml warnings
libxml_use_internal_errors(true);
$d->loadXML($xml_contents);
// clear xml warnings buffer
libxml_clear_errors();

$items = array();

// iterate all item tags
foreach ($d->getElementsByTagName('item') as $item) {
    $item_attributes = array();
    // iterate over children
    foreach ($item->childNodes as $child) {
        $item_attributes[$child->nodeName] = $child->nodeValue;
    }
    $items[] = $item_attributes;
}

var_dump($items);

edited May 12, 2012 at 5:04

answered May 12, 2012 at 4:54

Ja͢ck

174k39 gold badges269 silver badges317 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

johny7cz Over a year ago

Thanks for help, now it's working fine. I've just missed it - first time i wrote that script for parsing html document and then tried to use it for xml ... stupid mistake :)

Collectives™ on Stack Overflow

Parsing XML in PHP DOM via cURL - can't get nodeValue if it is url address or date

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related