Building a Simple RSS reader, retrieving content

Question

I am trying to make a simple RSS reader using SyndicationFeed class.

There are some standard tags, like <title>, <link>, <description>... there is no problem with them.

But there are some other tags. for example, in this feed, which created by WordPress, there is <content:encoded> tag. I think there may be other tags for the content part of other websites. right?

I want to know, how to find the main content of every post, is there any standards? which tags should I look for?

(for example, a site may use <content:encoded> but some other just use <description> or someone use another standard... I don't know what to do for retrieving the main content of a post)

P.S : I'm using this code for testing my simple RSS reader:

        var reader = XmlReader.Create("http://feed.2barnamenevis.com/2barnamenevis");
        var feed = SyndicationFeed.Load(reader);

        string s = "";
        foreach (SyndicationItem i in feed.Items)
        {
            s += i.Title.Text + "<br />" + i.Summary.Text + "<br />" + i.PublishDate.ToString() + "<br />";
            foreach (SyndicationElementExtension extension in i.ElementExtensions)
            {
                XElement ele = extension.GetObject<XElement>();
                s += ele.Name + " :: " + ele.Value + "<br />";
            }
            s += "<hr />";
        }
        return s;

Depends on what you want to support. Content element isn't part of RSS2.0 but is is of Atom (rss 4287). Read RSS2.0 specs cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt — Ralf de Kleine
– Ralf de Kleine, Commented Jun 13, 2012 at 14:42
Last time I tried writing a RSS reader, I eventually gave up after I realized that a significant number of feeds out there don't follow any standards. The major readers out there must be very forgiving when it comes to reading feeds. I see it kind of like browsers reading webpages - if people follow the standards, there's no problem, but if not, you'll be writing custom stuff all day long to handle the one-off scenarios. — Joe Enos
– Joe Enos, Commented Jun 13, 2012 at 14:48
@JoeEnos What does other feed reader applications do? They can read every feed. How they do that?! — Mahdi Ghiasi
– Mahdi Ghiasi, Commented Jun 13, 2012 at 14:55
I don't know for sure, but if I had to guess, I'd say months and years of trial-and-error, testing every feed they can get their hands on, analyzing the failures, and writing custom parsing to handle them. For example, if the date standard is Tue, 15 Mar 2012 08:45:46 -0700, your parser would expect that. Until some joker puts 2012-03-15 08:45:46 -7 in that XML field, and your parser breaks. So you allow your parser to accept both, which is fine until some other joker names the author tag <Author> instead of <author> - etc. — Joe Enos
– Joe Enos, Commented Jun 13, 2012 at 15:35
Seems that building an RSS reader is a BIG project! Isn't there any third party RSS Reader framework for .Net that supports all of these?? — Mahdi Ghiasi
– Mahdi Ghiasi, Commented Jun 13, 2012 at 15:51

Joe Enos · Accepted Answer · 2012-06-14 05:48:46Z

3

From our discussion in the comments, I'd probably suggest going with a 3rd party vendor instead of building it from scratch - Argotic and RSS.NET both look promising.

answered Jun 14, 2012 at 5:48

Joe Enos

40.6k11 gold badges86 silver badges144 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2021-10-07 06:26:44Z

0

Depends on what you want to support. Content element isn't part of RSS2.0 but is is of Atom (rss 4287).

Read RSS2.0 specs http://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt Read Atom specs https://www.rfc-editor.org/rfc/rfc4287

edited Oct 7, 2021 at 6:26

CommunityBot

11 silver badge

answered Jun 13, 2012 at 14:45

Ralf de Kleine

11.8k5 gold badges50 silver badges89 bronze badges

1 Comment

Mahdi Ghiasi Over a year ago

I just want to support popular formats, like every feed reader application that is available... Feed reader applications can read everything, with any type of content tag...

Mahdi Ghiasi · Accepted Answer · 2012-06-14 04:40:40Z

0

I have found Argotic Syndication Framework (thanks from JoeEnos).

Argotic has many Extensions, which can be used for handling elements which are not standard.

For example, You can use Argotic.Extensions.Core.SiteSummaryContentSyndicationExtension for retrieving <content:encoded>. You can see an example here. (if that example returns null for content, you should simply use MyRssItem.Description)

Some other useful extensions are WellFormedWebCommentsSyndicationExtension (for retrieving comments feed url) and SiteSummarySlashSyndicationExtension (for retrieving comments count).

answered Jun 14, 2012 at 4:40

Mahdi Ghiasi

15.4k19 gold badges78 silver badges121 bronze badges

Collectives™ on Stack Overflow

Building a Simple RSS reader, retrieving content

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related