4

I'd like to get the title tag and RSS feed address (if there is one) from a given URL, but the method(s) I've used so far just aren't working at all. I've managed to get the title tag by using preg_match and a regular expression, but I can't seem to get anywhere with getting the RSS feed address.

($webContent holds the HTML of the website)

I've copied my code below for reference...

` // Get the title tag preg_match('@(.*)@i',$webContent,$titleTagArray);

// If the title tag has been found, assign it to a variable
if($titleTagArray && $titleTagArray[3])
 $webTitle = $titleTagArray[3];

// Get the RSS or Atom feed address
preg_match('@<link(.*)rel="alternate"(.*)href="(.*)"(.*)type="application/rss+xml"\s/>@i',$webContent,$feedAddrArray);

// If the feed address has been found, assign it to a variable
if($feedAddrArray && $feedAddrArray[2])
 $webFeedAddr = $feedAddrArray[2];`

I've been reading on here that using a regular expression isn't the best way to do this? Hopefully someone can give me a hand with this :-)

Thanks.

2
  • RegExp is far away from the best solution ;) Use a feed reader, the Zend_Feed class of the zend framework for example. Commented Jun 16, 2010 at 14:48
  • Good pick if he was parsing an RSS Feed. He's parsing an HTML page though. Commented Jun 16, 2010 at 15:18

1 Answer 1

5

One approach

$dom = new DOMDocument;            // init new DOMDocument
$dom->loadHTML($html);             // load HTML into it
$xpath = new DOMXPath($dom);       // create a new XPath

$nodes = $xpath->query('//title'); // Find all title elements in document
foreach($nodes as $node) {         // Iterate over found elements
    echo $node->nodeValue;         // output title text
}

To get the href attribute of all link tags with a type of "application/rss+xml" you would use this XPath:

$xpath->query('//link[@type="application/rss+xml"]/@href');
Sign up to request clarification or add additional context in comments.

1 Comment

For a wider range of feed types, you could use something like: /html/head/link[@rel="alternate" and @href and (@type="application/atom+xml" or @type="application/rss+xml" or @type="application/rdf+xml")]/@href —— regex would be nice, but or will suffice

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.