0

I have lines of data which I want to parse. The data looks like this:

a score=216 expect=1.05e-06
a score=180 expect=0.0394

What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.

However this function of mine doesn't seem to work:

def scoreEvalFromMaf(mafLines):
    for word in mafLines[0]:
        if word.startswith("score="):
            theScore = word.split('=')[1]
            theEval  = word.split('=')[2]
            return [theScore, theEval]
    raise Exception("encountered an alignment without a score")

Please advice what's the right way to do it?

1
  • As an aside, never raise Exception, as it is impossible to catch it sanely. Always raise something more narrow, like ValueError or something that you create. Commented Jun 2, 2010 at 4:29

3 Answers 3

2

It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines():

def scoreEvalFromMafLine(mafLine):
    theScore, theEval = None, None
    for word in mafLine.split():
        if word.startswith("score="):
            theScore = word.split('=')[1]
        if word.startswith("expect="):
            theEval  = word.split('=')[1]

    if theScore is None or theEval is None:
        raise Exception("Invalid line: '%s'" % line)

    return (theScore, theEval)

The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.

Sign up to request clarification or add additional context in comments.

4 Comments

@AB: Hi Tony, thanks. But I also get the same message "error: 'list' object has no attribute 'split'"using your snippet.
Then mafLines is a list of lists, not a list of strings. I was assuming that mafLines was output from .readlines() or similar, but if it isn't, you'll need to clarify what exactly it is, or how you're producing it.
I fixed it using: "for word in mafLine[0]:"
Sounds like you've already split up your input lines by spaces. So your input (mafLines)will look like: [['a', 'score=1', 'expect=2'], ['a', 'score=3', 'expect=42'], ...] You might be better off making your function just take one line, rather than the whole list, since it'll be easier to reuse the function later on in your program.
2

If mafLines if a list of lines, and you want to look just at the first one, .split that line to obtain the words. For example:

def scoreEvalFromMaf(mafLines):
    theScore = None
    theEval = None
    for word in mafLines[0].split:
        if word.startswith('score='):
            _, theScore = word.partition('=')
        elif word.startswith('expect='):
            _, theEval = word.partition('=')
    if theScore is None:
        raise Exception("encountered an alignment without a score")
    if theEVal is None:
        raise Exception("encountered an alignment without an eval")
    return theScore, theEval

Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to

    return int(theScore), float(theEval)

and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.

3 Comments

@AM: Hi Alex, thanks. But I get this message "error: 'list' object has no attribute 'split'". BTW, is this the right way to store the output of the function: [score,exp] = scoreEvalFromMaf(maf)
Sounds like mafLines is a list of lists rather than a list of strings. How are you generating it? There are also a couple of bugs in that code: you need to use .split() (ie. it's a function call), and also use word.split('=') instead of word.partition('=')
@neversaint, you definitely need to clarify what that mysterious mafLines is -- presumably a list of lists, as Anthony says (given the error message you get), but without knowing how you've built it it's essentially impossible to "read your mind" and just divine what the pieces are, out of thin air. Yes, once you clarify this point, you can (if you wish) put those useless brackets around the score, exp on the right-hand side of the assignment.
1

Obligatory and possibly inappropriate regexp solution:

import re
def scoreEvalFromMaf(mafLines):
    return [re.search(r'score=(.+) expect=(.+)', line).groups()
            for line in mafLines]

2 Comments

That'll explode for invalid input (although that behaviour might be what you want). Turning your (.+) into (.*) helps to catch blank values, but will still die for really dodgy input.
True enough. It's just a quick-and-dirty demonstration of an alternate strategy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.