I'm trying to parse strings in python. I have posted a couple of questions on stack overflow and I was basically trying to combine the functionality of all the different possible ways of parsing the strings I am working with.
Here's a code snippet that works just fine in isolation to parse the two following string formats.
from __future__ import generators
from pprint import pprint
s2="<one><two><three> an.attribute ::"
s1="< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > < eight : 90.1 > < nine : 8.7 >"
def parse(s):
for t in s.split('<'):
for u in t.strip().split('>',1):
if u.strip(): yield u.strip()
pprint(list(parse(s1)))
pprint(list(parse(s2)))
Here's the output that I get. It's in the format that I need where each attribute is stored in a different index location.
['one',
'two',
'three',
"here's one attribute",
'six : 10.3',
'seven : 8.5',
'eight : 90.1',
'nine : 8.7']
['one', 'two', 'three', 'an.attribute ::']
After that was done, I tried to incorporate the same code into a function which can parse four string formats but for some reason it doesn't seem to work here and I cant figure out why.
Here's the incorporated code in its entirety.
from __future__ import generators
import re
import string
from pprint import pprint
temp=[]
y=[]
s2="< one > < two > < three > an.attribute ::"
s1="< one > < two > < three > here's an attribute < four : 6.5 > < five : 7.5 > < six : 8.5 > < seven : 9.5 >"
t2="< one > < two > < three > < four : 220.0 > < five : 6.5 > < six : 7.5 > < seven : 8.5 > < eight : 9.5 > < nine : 6 - 7 >"
t3="One : two : three : four Value : five Value : six Value : seven Value : eight Value :"
def parse(s):
c=s.count('<')
print c
if c==9:
res = re.findall('< (.*?) >', s)
return res
elif (c==7|c==3):
temp=parsing(s)
pprint(list(temp))
#pprint(list(parsing(s)))
else:
res=s.split(' : ')
res = [item.strip() for item in s.split(':')]
return res
def parsing(s):
for t in s.split(' < '):
for u in t.strip().split('>',1):
if u.strip(): yield u.strip()
pprint(list((s)))
Now when I compile the code and call parse(s1) I get the following as the output:
7
["< one > < two > < three > here's an attribute < four",
'6.5 > < five',
'7.5 > < six',
'8.5 > < seven',
Similarly, on calling parse(s2), I get:
3
['< one > < two > < three > an.attribute', '', '']
'9.5 >']
Why is there an inconsistency in spliting the string while it is being parsed? I'm using the same code in both places.
Could someone help me figure out why this is happening? :)
from __future__ import generatorsimplies anicentfrom __future__ import generatorsline at all then.