Parse Stackoverflow RSS job feed for same name elements, with Feedparser in Python

Question

Every job item on the Stackoverflow RSS feed has certain tags, with the key "category".

Looking basically like this:

<category>scala</category>
<category>hadoop</category>
<category>apache-spark</category>
<category>hive</category>
<category>json</category>

I would like to use Feedparser, to put all tags into a list. Instead I always get just the first element. The Feedparser documentation mentioned entries[i].content, but I am unsure if that's the right approach, or how to use it in this case.

Here is my code:

import feedparser

rss_url = "https://stackoverflow.com/jobs/feed"
feed = feedparser.parse(rss_url)
items = feed["items"]

for item in items:
    title = item["title"]
    try:
        tags = []
        tags.append(item["category"])
        print(title + " " + str(tags))
    except:
        print("Failed")

Martijn Pieters · Accepted Answer · 2017-10-28 14:32:17Z

2

category on feedparser items is basically an alias for the first element in the tags list, which is basically a list of more feedparser items, each with a term attribute that contains the tag name.

You can just access the terms directly:

categories = [t.term for t in item.get('tags', [])]

For your code that is:

for item in items:
    title = item["title"]
    categories = [t.term for t in item.get('tags', [])]
    print(title, ', '.join(categories))

See the entries[i].tags documentation.

edited Oct 28, 2017 at 14:32

answered Oct 28, 2017 at 14:27

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parse Stackoverflow RSS job feed for same name elements, with Feedparser in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related