Python parse text file into nested dictionaries

Question

Consider the following data structure:

[HEADER1]
{
   key value
   key value
   ...
   [HEADER2]
   {
      key value
      ...
   }
   key value
   [HEADER3]
   {
      key value
      [HEADER4]
      {
         key value
         ...
      }
   }
   key value
}

There are no indents in the raw data, but I added them here for clarity. The number of key-value pairs is unknown, '...' indicates there could be many more within each [HEADER] block. Also the amount of [HEADER] blocks is unknown.

Note that the structure is nested, so in this example header 2 and 3 are inside header 1 and header 4 is inside header 3.

There can be many more (nested) headers, but I kept the example short.

How do I go about parsing this into a nested dictionary structure? Each [HEADER] should be the key to whatever follows inside the curly brackets.

The final result should be something like:

dict = {'HEADER1': 'contents of 1'}
contents of 1 = {'key': 'value', 'key': 'value', 'HEADER2': 'contents of 2', etc}

I'm guessing I need some sort of recursive function, but I am pretty new to Python and have no idea where to start.

For starters, I can pull out all the [HEADER] keys as follows:

path = 'mydatafile.txt'
keys = []

with open (path, 'rt') as file:
   for line in file:
      if line.startswith('['):
         keys.append(line.rstrip('\n'))

for key in keys:
   print(key)

But then what, maybe this not even needed?

Any suggestions?

So are there really headers without closing }s and also double }s and values outside of {}s? — Jon Clements
– Jon Clements, Commented Oct 21, 2017 at 18:06
No, each header is followed by {...}, but since they can be nested, there could be two closing brackets on adjacent lines. — koen
– koen, Commented Oct 21, 2017 at 18:07
Oh wait, is header 2 within header1 ? Might be an idea to show how you'd expect the output dict to actually look — Jon Clements
– Jon Clements, Commented Oct 21, 2017 at 18:08

Ashish Ranjan · Accepted Answer · 2017-10-21 19:08:15Z

4

You can do it by pre-formatting your file content using few regex and then pass it to json.loads

You can do these kind of regex substitutions one by one:

#1 \[(\w*)\]\n -> "$1":

#2 \}\n(\w) -> },$1

#3 (\w*)\s(\w*)\n([^}]) -> $1:$2,$3

#4 (\w*)\s(\w*)\n\} -> $1:$2}

and then finally pass the final string to json.loads:

import json
d = json.loads(s)

which will parse it to a dict format.

Explanation :

1. \[(\w*)\]\n : replace [HEADERS]\n with "HEADERS":

2. \}\n(\w): replace any closing braces i.e, } that have any value after them, with },

3. (\w*)\s(\w*)\n([^}]): replace key value\n with key:value, for lines having any next elements

4. (\w*)\s(\w*)\n\}: replace key value\n with key:value for lines having no next elements

So, by minor modifications to these regexes you will be able to parse it to a dict format, the basic concept is to reformat the file contents to a format that can be parsed easily.

edited Oct 21, 2017 at 19:08

answered Oct 21, 2017 at 18:37

Ashish Ranjan

5,5533 gold badges20 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

koen Over a year ago

So how do I iterate over the lines and accomplish this? for line in file: line = re.sub("\[(\w*)\]\n", "", line) is not changing anything?

Ashish Ranjan Over a year ago

don't iterate over lines, read the whole file and then use these regex on the file content and then pass the resulting string to the next regex

koen Over a year ago

I see, I tried that: s = open(path, 'rt').read() s1 = re.sub("\[(\w*)\]\n", "", s), but no changes.

Ashish Ranjan Over a year ago

also don't replace with empty string, check the answer for what to substitute with which regex. see this for how to use captured groups : stackoverflow.com/questions/6711567/…

Ashish Ranjan Over a year ago

checkout the above link, that'll help a lot with substitution, basically you need to use \1 instead of $1 in python

|

Collectives™ on Stack Overflow

Python parse text file into nested dictionaries

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related