I have a very basic XML parser based on the tutorial provided here, for the purpose of reading RSS feeds in Python.
def GetRSS(RSSurl):
url_info = urllib.urlopen(RSSurl)
if (url_info):
xmldoc = minidom.parse(url_info)
if (xmldoc):
for item_node in xmldoc.documentElement.childNodes:
if (item_node.nodeName == "item"):
PrintNodeItems(item_node, ["title","link"])
else:
print "error"
def PrintNodeItems(XmlNode, items):
for item_node in XmlNode.childNodes:
if item_node.nodeName in items:
PrintNodesText(item_node)
def PrintNodesText(XmlNode):
text = ""
for text_node in XmlNode.childNodes:
if(text_node.nodeType == Node.TEXT_NODE):
text = text_node.nodeValue
if (len(text)>0):
print text
print ""
I have tested the GetRSS function on the address provided in the tutorial (http://rss.slashdot.org/Slashdot/slashdot), and it works just fine, providing me with the correct feedback. However, my intention when learning how to write this module was to use it for reading the RSS feed at RedLetterMedia (http://redlettermedia.com/feed/). When I attempt to use the GetRSS function in the Python Shell on that address, I get a blank line as feedback instead of the correct results. I also tested it on CNN's "World" RSS feed, and received no results for that as well. I have used urllib.urlopen on all addresses and they all appear to use the same format for their nodes and child nodes (<item><title><description><link></item>).
I figure, as was the case for my previous question, there is probably something very obvious I am missing. Does anybody know what that is?
Edit: and for the record, my error message has not come up at all, but maybe that's because I integrated it into the code incorrectly; I would not put it beyond me.
update: Rewrote code from scratch using multiple answered questions on stackoverflow. Works like a charm!
def GetRSS(RSSurl):
url_info = urllib.urlopen(RSSurl)
if (url_info):
xmldoc = minidom.parse(url_info)
if (xmldoc):
channel = xmldoc.getElementsByTagName('channel')
for node in channel:
item = xmldoc.getElementsByTagName('item')
for node in item:
alist = xmldoc.getElementsByTagName('link')
for a in alist:
linktext = a.firstChild.data
print linktext
def main():
GetRSS('http://redlettermedia.com/feed/')