0

I've been trying to find the best and the most cleanest way of parsing xml in python. On chatango there's a xml site with a user's profile information like date of birth(b tag), gender(s tag) their mini(body tag and quoted) and location(l tag). What I'm trying to do is get the text of those tags, but the issue/problem is if a user didn't fill out something in their profile, the tag and the text will not be on the xml site. So I'm trying to check if that tag is on the site and get the text, if it's not I'm gonna change it to a question mark. So the issue is I need help finding a cleaner way of doing it. I've been looking up some issues like this but didn't find anything so hopefully you guys can help. :P

Here's some of the xml sites:

This one has all the tags: http://ust.chatango.com/profileimg/c/r/cress/mod1.xml

And an example of one that only has some: http://ust.chatango.com/profileimg/c/o/core/mod1.xml

Here's a code I came up with:

import urllib.request
import urllib.parse
import datetime
from xml.etree import cElementTree as ET

class prof:

    def getProf(name):
        if len(name) == 1: url = "http://ust.chatango.com/profileimg/"+name+"/"+name+"/"+name+"/mod1.xml"
        elif len(name) > 1: url = "http://ust.chatango.com/profileimg/"+name[0]+"/"+name[1]+"/"+name+"/mod1.xml"
        f = urllib.request.urlopen(url)
        data = f.read().decode("utf-8")
        data = ET.parse(data).getroot()
        today = datetime.date.today()
        if data.find("s") is not None:
            gender = data.find("s").text
        else:
            gender = "?"
        if data.find("b") is not None:
            age = data.find("b").text.split("-")
            age = today.year - age[0] - ((today.month, today.day) < (age[1], age[2]))
        else:
            age = "?"
        if data.find("l") is not None:
            location = data.find("l").text
        else:
            location = "?"
        if data.find("body") is not None:
            mini = urllib.parse.unquote(data.find("body").text)
        else:
            mini = "?"
        if len(mini) < 1575:
            return "%s Info - Gender: %s, Age: %s, Location: %s <br/> %s" % (name, gender, age, location, mini)
       else:
            return "%s Info - Gender: %s, Age: %s, Location: %s <br/> Too many characters to display!" % (name, gender, age, location)
1
  • Here's a more updated paste of the code bpaste.net/show/479925 to match the one on here. Commented Jul 20, 2014 at 2:48

1 Answer 1

1

There's nothing really wrong with your solution, however if you want it a bit cleaner:

Instead of

if data.find("s") is not None:
    gender = data.find("s").text
else:
    gender = "?"

You can use the findtext function which allows you to specify a default:

gender = data.findtext("s", "?")

This applies for gender and location, but for age and mini what you are already doing is good.

Sign up to request clarification or add additional context in comments.

1 Comment

Ah I gotcha. Usually I'm worried about how my code looks, so I look for other people's opinions. Thanks for helping me out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.