1

Coming from a file I have something like the following string:

var1 : data1
var2 : data2
dict1 {  
     var3 : data3  
     dict2 {  
         var4 : data4  
     }
     var5 : data5
}
dict3 {
     var6 : data6
     var7 : data7
}

and so on. (end of lines are \n, indents are \t each)
And I try to convert it into something like that:

Dictionary={"var1":"data1","var2":"data2", "dict1" : 
    {"var3":"data3", "dict2" : {
        "var4":"data4" }, "var5":"data5"}
    , dict3:{"var6":"data6","var7":"data7"}

(indents are only too keep it somehow human readable)
To solve it, all I can think of, is to split it into a list, then walk the list down until I find a "}" in the string, delete it (so i won't run into it later), then walk up until I find string with "{", remove the whitespaces before and the " {" after (using right now temp=re.split ('(\S+) \{',out[z]) for this example the 1st temp[1] would be 'dict2'), add everything in between, and finally move on to the next "}".

But that's not fast or elegant. I am definitely missing something.
code is currently:

def procvar(strinG):
    x=y=z=temp1=temp2=0
    back = False
    out=re.split ('\n',strinG) #left over from some other tries
    while z < len(out):
        print "z=",z," out[z]= ", out[z]
        if "{" in out[z]:
            if back == True:
                back = False
                xtemp=re.split ('(\S+) \{',out[z])
                out[z]=xtemp[1]
                ytemp=xtemp[1]
                temp2=z+1
                print "Temp: ",temp1," - ",out[temp1]
                out[z]={out[z]:[]}
                while temp2 <= temp1:
                    out[z][xtemp[1]].append(out[temp2]) # not finished here, for the time being I insert the strings as they are
                    del out[temp2]
                    temp1-=1
                print out[z]
        if "}" in out[z]:
            back = True
            del out[z]
            temp1 = z-1
        if back == True:
            z-=1
        else:
            z+=1
    return out
0

3 Answers 3

2

your format is close enough to the yaml one (easy_install pyyaml): http://pyyaml.org/wiki/PyYAML

x = """var1 : data1
var2 : data2
dict1 {  
     var3 : data3  
     dict2 {  
         var4 : data4  
     }
     var5 : data5
}
dict3 {
     var6 : data6
     var7 : data7
}"""

x2 = x.replace('{', ':').replace('}','')
yaml.load(x2) 

{'dict1': {'dict2': {'var4': 'data4'}, 'var3': 'data3', 'var5': 'data5'},
 'dict3': {'var6': 'data6', 'var7': 'data7'},
 'var1': 'data1',
 'var2': 'data2'}
Sign up to request clarification or add additional context in comments.

3 Comments

The only problem with this solution is that it will modify the keys and values if they contain either { or } characters.
True. It is a hack. The proper solution would be to adapt Yaml's language, it seems feasible but I don't know enough about it: pyyaml.org/wiki/…
Maybe just to fix the replacements: x = re.sub('^\s*}\s*$', '', x) and x = re.sub('^(\s*[^\s]+\s+){(\s*)$', '\\1:\\2', x). This will be a little bit safer.
0
import re

# key : value regexp
KV_RE = re.compile(r'^\s*(?P<key>[^\s]+)\s+:\s+(?P<value>.+?)\s*$')
# dict start regexp
DS_RE = re.compile(r'^\s*(?P<key>[^\s]+)\s+{\s*$')
# dict end regexp
DE_RE = re.compile(r'^\s*}\s*$')


def parse(s):
    current = {}
    stack = []
    for line in s.strip().splitlines():
        match = KV_RE.match(line)
        if match:
            gd = match.groupdict()
            current[gd['key']] = gd['value']
            continue
        match = DS_RE.match(line)
        if match:
            stack.append(current)
            current = current.setdefault(match.groupdict()['key'], {})
            continue
        match = DE_RE.match(line)
        if match:
            current = stack.pop()
            continue
        # Error occured
        print('Error: %s' % line)
        return {}
    return current

2 Comments

Thanks a bunch, think the correct way to handle regexp is everything, eh? In fact I had to make some slight modifications like KV_RE = re.compile('^\s*(?P<key>[\w\d]+) : (?P<value>.*)$') due the fact that the data string may or may not contain data, or may or may not contain whitespaces and similar. And also for some reasons the function stumbles across an empty stack at some time. To prevent the crash of this function I used try: at that point.
It was just a rough approximation, cause I don't know how your data exactly looks like. If it is really regular with no special characters, the yaml and literal_eval solutions are also a good choice, this is more general and extendable. But if you are getting empty stack errors, that probably means that your data is not that regular after all? Or not every closing brace is on it's own line?
0

If your text is in the same regular pattern as the example, you can use ast.literal_eval to parse the string.

First, let's modify the string to be legal Python dict text:

import re

st='''\
var1 : data1
var2 : data2
dict1 {  
     var3 : data3  
     dict2 {  
         var4 : data4  
     }
     var5 : data5
}
'''

# add commas after key, val pairs
st=re.sub(r'^(\s*\w+\s*:\s*\w+)\s*$',r'\1,',st,flags=re.M)

# insert colon after name and before opening brace 
st=re.sub(r'^\s*(\w+\s*){\s*$',r'\1:{',st,flags=re.M)

# add comma closing brace
st=re.sub(r'^(\s*})\s*$',r'\1,',st,flags=re.M)

# put names into quotes
st=''.join(['"{}"'.format(s.group(0)) if re.search(r'\w+',s.group(0)) else s.group(0) 
                for s in re.finditer(r'\w+|\W+',st)])

# add opening and closing braces
st='{'+st+'}'
print st

prints the modified string:

{"var1" : "data1",
"var2" : "data2",
"dict1" :{
     "var3" : "data3",
"dict2" :{
         "var4" : "data4",
     },
     "var5" : "data5",
},}

Now use ast to turn the string into a data structure:

import ast
print ast.literal_eval(st)

prints

{'dict1': {'var5': 'data5', 'var3': 'data3', 'dict2': {'var4': 'data4'}}, 'var1': 'data1', 'var2': 'data2'}

2 Comments

Hmm, also not bad. Maybe I give that also a shot. Since the string isn't written in stone. As a fact I generate it at some point. And it came into my mind that I maybe have to implement some way or the other to make the data strings capable to handle multiple lines.
If you have control of the program creating the files, there are better options for persistent data. Look at pickle and json

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.