PHP RSS Feed Crawler

Question

I wanna build a RSS Feed Crawler for my website. Though im not quite sure, how to begin this. How can my Crawler identify the RSS feed? Is there any thing I can crawl for, which every RSS reader has? I don't need any code, just some help for my brain to understand what I have to create.

Thanks in before!

Greetings

Xatenev

Check superfeedr.com if you don't feel like re-inventing the wheel :) — Julien Genestoux
– Julien Genestoux, Commented Apr 11, 2014 at 9:39
Hey, it seems very cool, but what can I do with that? :P It seems like a huge database for feeds, where i (possibly) get a lot of RSS feeds. Is that correct?^^ — xate
– xate, Commented Apr 11, 2014 at 10:11

Duco · Accepted Answer · 2014-04-10 14:36:58Z

2

I think it would be possible if your crawler scans all links and opens each page at least one time to look for the text <rss version="2.0">. From what I understand, every RSS feed should contain this line.

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
 <title>RSS Title</title>
 <description>This is an example of an RSS feed</description>
 <link>http://www.someexamplerssdomain.com/main.html</link>
 <lastBuildDate>Mon, 06 Sep 2010 00:01:00 +0000 </lastBuildDate>
 <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 <ttl>1800</ttl>

 <item>
  <title>Example entry</title>
  <description>Here is some text containing an interesting description.</description>
  <link>http://www.wikipedia.org/</link>
  <guid>unique string per item</guid>
  <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 </item>

</channel>
</rss>

If you're going to use PHP, I have very positive experiences with SimpleXML which is built in PHP.

P.S. Xatenev you're welcome ;)

edited Apr 10, 2014 at 14:36

answered Apr 10, 2014 at 13:45

Duco

2462 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

xate Over a year ago

And how can I actually crawl for those RSS feeds? How can my crawler identify those, and give me the data back, which I need?

Duco Over a year ago

I don't know if you have much experience with regular expressions, I think that's the way to go.

xate Over a year ago

I know about regular expressions, but I mean a crawler for example, just goes on a website and picks up all links, then he continues crawling on the other website. How can i pick up all RSS feeds on the website? Those links are easy found from the source code, can I find RSS feeds from the source code aswell?

Duco Over a year ago

Could you clarify that some more? I think it would be possible if your crawler scans all links and opens each page at least one time to look for the text "<rss version="2.0">". From what I understand, every RSS feed should contain this line.

xate Over a year ago

Ah thats what i wanted to know, very cool, thank you! Thanks for the good explanation :).

Collectives™ on Stack Overflow

PHP RSS Feed Crawler

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related