2

This is a follow-up to this question: Python parse text file into nested dictionaries

I initially accepted the answer which suggested to format the input with regexes, but after looking closer at the input, there are still some problems that I cannot can process with the proposed regexes.

So I am back at recursively parsing the lines into a dictionary.

What I have so far is:

def parseToDictionary(input):
    key = ''
    value = ''
    result = {}

    if input[0].startswith('{'): # remove {
        del input[0]

    result.clear() # clear the dict for each recursion

    for idx, line in enumerate(input):
        line = line.rstrip() # remove trailing returns

        if line.startswith('['):
            key = line
            value = parseToDictionary(input[idx+1:]) # parse the next level
        elif line.startswith('}'): # reached the end of a block
            return result
        else:
            elements = line.split('\t')
            key = elements[0]
            if len(elements) > 1:
                value = elements[1]
            else:
                value = 'Not defined' # some keys may not have a value, so set a generic value here
        if key:
            result[key] = value

    return result

Here is an example (very simplified!) input:

[HEADER1]
{
key1    value
key2    long value, with a comma
[HEADER2]
{
key 1234
emptykey
}
}

The output is:

'[HEADER2]': 
{
    'emptykey': 'Not defined', 
    'key': '1234'
}, 
'key2': 'long value, with a comma', 
'key1': 'value', 
'[HEADER1]': 
{
    'emptykey': 'Not defined', 
    'key2': 'long value, with a comma', 
    'key1': 'value', 
    'key': '1234', 
    '[HEADER2]': 
    {
        'emptykey': 'Not defined', 
        'key': '1234'
    }
 }, 
 'emptykey': 'Not defined', 
 'key': '1234'
 }

But it should be:

'[HEADER1]': 
{
    'key1': 'value', 
    'key2': 'long value, with a comma', 
    '[HEADER2]': 
    {
        'emptykey': 'Not defined', 
        'key': '1234'
    }
 }

So each line that starts with an [ is the key for the next block. Inside each blocks are multiple key-value pairs, and there could also be another nested level. What goes wrong is that some blocks are parsed multiple times, and I cannot figure out where it goes wrong.

The input parameter is mydatafile.split('\n')

Who can help me out?

2
  • 1
    What is input? Is it a list of tokens? Commented Oct 28, 2017 at 21:02
  • I have added that to the question Commented Oct 28, 2017 at 21:11

1 Answer 1

2

You have to skip the lines, that are processsed in the subsections:

def parse_to_dictionary(lines):
    def parse_block(lines):
        contents = {}
        if next(lines).strip() != '{':
            raise AssertionError("'{' expected")
        for line in lines:
            line = line.strip()
            if line == '}':
                return contents
            elif line[0] == '[':
                contents[line] = parse_block(lines)
            else:
                parts = line.split('\t', 1)
                contents[parts[0]] = None if len(parts) == 1 else parts[1]

    lines = iter(lines)
    key = next(lines)                
    if key[0] != '[':
        raise AssertionError("format error")
    return {key: parse_block(lines)}
Sign up to request clarification or add additional context in comments.

3 Comments

Since the first line is not {, i will fail immediately. What I really try to do is just to skip the lines with {.
So to fix that, I commented out the raise AssertionError("'{' expected") line and indented the for-loop. Works perfect now.
Disregard these comments, there was an error in my input file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.