1

I have a very basic XML parser based on the tutorial provided here, for the purpose of reading RSS feeds in Python.

def GetRSS(RSSurl):
    url_info = urllib.urlopen(RSSurl)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        for item_node in xmldoc.documentElement.childNodes:
            if (item_node.nodeName == "item"):  
                PrintNodeItems(item_node, ["title","link"])
    else:
        print "error"

def PrintNodeItems(XmlNode, items):
    for item_node in XmlNode.childNodes:
        if item_node.nodeName in items:
            PrintNodesText(item_node)

def PrintNodesText(XmlNode):
    text = ""
    for text_node in XmlNode.childNodes:
        if(text_node.nodeType == Node.TEXT_NODE):
            text = text_node.nodeValue
    if (len(text)>0):
        print text
        print ""

I have tested the GetRSS function on the address provided in the tutorial (http://rss.slashdot.org/Slashdot/slashdot), and it works just fine, providing me with the correct feedback. However, my intention when learning how to write this module was to use it for reading the RSS feed at RedLetterMedia (http://redlettermedia.com/feed/). When I attempt to use the GetRSS function in the Python Shell on that address, I get a blank line as feedback instead of the correct results. I also tested it on CNN's "World" RSS feed, and received no results for that as well. I have used urllib.urlopen on all addresses and they all appear to use the same format for their nodes and child nodes (<item><title><description><link></item>).

I figure, as was the case for my previous question, there is probably something very obvious I am missing. Does anybody know what that is?

Edit: and for the record, my error message has not come up at all, but maybe that's because I integrated it into the code incorrectly; I would not put it beyond me.

update: Rewrote code from scratch using multiple answered questions on stackoverflow. Works like a charm!

def GetRSS(RSSurl):
    url_info = urllib.urlopen(RSSurl)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        channel = xmldoc.getElementsByTagName('channel')
        for node in channel:
            item = xmldoc.getElementsByTagName('item')
            for node in item:
                alist = xmldoc.getElementsByTagName('link')
                for a in alist: 
                    linktext = a.firstChild.data
                    print linktext


def main():
    GetRSS('http://redlettermedia.com/feed/')

1 Answer 1

1

The error is here:

for item_node in xmldoc.documentElement.childNodes:
    if (item_node.nodeName == "item"):

There is no root item element, just a channel. I found this out by just printing all the values of nodeName in the loop.

Sign up to request clarification or add additional context in comments.

2 Comments

So, I should replace "item" in that line with "channel"? I have just tried that, and it does return a result now: >>> GetRSS('http://redlettermedia.com/feed') http://redlettermedia.com I suppose this is a step up from no response at all, but it is still not the response I was trying to receive. Any idea why I am getting this response? I will examine all the nodes and my attempts to call them in the meantime, maybe it is another case of trying to call something that isn't there, like the root item element.
@Jordan: you'll want to look for <item>s within the <channel>. Consider using LXML instead of minidom, that has XPath support.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.