0

I'm trying to parse strings in python. I have posted a couple of questions on stack overflow and I was basically trying to combine the functionality of all the different possible ways of parsing the strings I am working with.

Here's a code snippet that works just fine in isolation to parse the two following string formats.

from __future__ import generators
from pprint import pprint
s2="<one><two><three> an.attribute ::"
s1="< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > <   eight :   90.1 > < nine : 8.7 >"
def parse(s):
    for t in s.split('<'):
        for u in t.strip().split('>',1):
            if u.strip(): yield u.strip()
pprint(list(parse(s1)))
pprint(list(parse(s2)))

Here's the output that I get. It's in the format that I need where each attribute is stored in a different index location.

['one',
 'two',
 'three',
 "here's one attribute",
 'six : 10.3',
 'seven : 8.5',
 'eight : 90.1',
 'nine : 8.7']
['one', 'two', 'three', 'an.attribute ::']

After that was done, I tried to incorporate the same code into a function which can parse four string formats but for some reason it doesn't seem to work here and I cant figure out why.

Here's the incorporated code in its entirety.

from __future__ import generators
import re
import string
from pprint import pprint
temp=[]
y=[]
s2="< one > < two > < three > an.attribute ::"
s1="< one > < two > < three > here's an attribute < four : 6.5 > < five : 7.5 > < six : 8.5 > < seven : 9.5 >"
t2="< one > < two > < three > < four : 220.0 > < five : 6.5 > < six : 7.5 > < seven : 8.5 > < eight : 9.5 > < nine : 6 -  7 >"
t3="One : two :  three : four  Value  : five  Value  : six  Value : seven  Value :  eight  Value :"
def parse(s):
    c=s.count('<')
    print c
    if c==9:
        res = re.findall('< (.*?) >', s)
        return res
    elif (c==7|c==3):
        temp=parsing(s)
        pprint(list(temp))
        #pprint(list(parsing(s)))
    else: 
        res=s.split(' : ')
        res = [item.strip() for item in s.split(':')]
        return res
def parsing(s):
    for t in s.split(' < '):
        for u in t.strip().split('>',1):
            if u.strip(): yield u.strip()
    pprint(list((s)))

Now when I compile the code and call parse(s1) I get the following as the output:

7
["< one > < two > < three > here's an attribute < four",
 '6.5 > < five',
 '7.5 > < six',
 '8.5 > < seven',

Similarly, on calling parse(s2), I get:

3
['< one > < two > < three > an.attribute', '', '']
   '9.5 >']

Why is there an inconsistency in spliting the string while it is being parsed? I'm using the same code in both places.

Could someone help me figure out why this is happening? :)

3
  • First thing that strikes me - what version of Python are you using!? from __future__ import generators implies anicent Commented Mar 14, 2013 at 10:00
  • I'm using PyScripter 2.7. What do you suggest that I use instead? :) @JonClements Commented Mar 14, 2013 at 10:05
  • You do not need to use the from __future__ import generators line at all then. Commented Mar 14, 2013 at 10:07

1 Answer 1

2

You are using the binary | bitwise or operator where you should be using the or boolean operator instead:

elif (c==7|c==3):

should be

elif c==7 or c==3:

or perhaps:

elif c in (3, 7):

which is faster to boot.

Because the | operator has a different precedence than the or operator, the first statement was interpreted as (c == (7 | c) == 3) with 7 | c doing a bitwise logical operation, returning a result which is never going to be equal to both c and 3, so that always returns False:

>>> c = 7
>>> (c==7|c==3)
False
>>> c = 3
>>> (c==7|c==3)
False
>>> c==7 or c==3
True
Sign up to request clarification or add additional context in comments.

2 Comments

Yes that's what was wrong. Thanks so much, I'm new to Python and I'm still finding my way around it =) @Martijn Pieters
@Paulie It's also worth noting that c==7|c==3|c==9|c==2 etc... Can more Pythonically be written as c in (7, 3, 9, 2) - which is clearer, and makes adding/removing conditions easier

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.