How can I create a nested dictionary object from tree-like file-directory text-file?

Question

I have a tree-structure separated by tabs and lines like this:

a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5

And I am looking to turn this into:

{
'name': 'a',
'children': [
 {'name': '1'},
 {'name': '2'},
 {
   'name': '3'
   'children': [
      {'name': 'b'},
      {'name': 'c'}
    ]
  },
  {'name': '4'},
  {'name': '5'}
  ]
}

for a d3.js collapsible tree data input. I am assuming I have to use recursion in some way but I cannot figure out how.

I have tried turning the input into a list like this:

[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]

using this code:

def parser():
    #run from root `retail-tree`: `python3 src/main.py`
    l, all_line_details = list(), list()
    with open('assets/retail') as f:
        for line in f:
            line = line.rstrip('\n ')
            splitline = line.split('    ') 
            tup = (splitline[-1], len(splitline)-1)
            l.append(splitline)
            all_line_details.append(tup)
            print(tup)
    return all_line_details

Here, the first element is the string itself and the second is the number of tabs there are in that line. Unsure of the recursion step to accomplish this. Appreciate any help!

blhsing · Accepted Answer · 2019-03-31 05:48:31Z

2

You can use a function that uses re.findall with a regex that matches a line as the name of the node, followed by 0 or more lines that start with a tab, grouped as the children, and then recursively builds the same structure for the children after stripping the first tab of each line from the children string:

import re
def parser(s):
    output = []
    for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
        node = {'name': name}
        if children:
            node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
        output.append(node)
    return output

so that given:

s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''

parser(s)[0] returns:

{'name': 'a',
 'children': [{'name': '1'},
              {'name': '2'},
              {'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
              {'name': '4'},
              {'name': '5'}]}

edited Mar 31, 2019 at 5:48

answered Mar 31, 2019 at 4:21

blhsing

109k9 gold badges89 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

akad3b Over a year ago

thanks for the answer! is there a way to replace \t with 4-spaces? my data is in spaces and i tried using something like this, where tab_4 = ' ': rx = "^(.*?)\n((?:{0}.*?\n)*)".format(tab_4) for name, children in re.findall(rx, s, re.M | re.S)

blhsing Over a year ago

Yes, you will simply have to change the line slices for children from line[1:] to line[4:] after replacing \t in the regex with 4 spaces.

Riley John Gibbs · Accepted Answer · 2019-04-01 06:46:00Z

0

Working from the list structure you provided from your own parser function:

def make_tree(lines, tab_count=0):
    tree = []
    index = 0
    while index < len(lines):
        if lines[index][1] == tab_count:
            node = {"name": lines[index][0]}
            children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
            if children:
                node["children"] = children
                index += lines_read
            tree.append(node)
        else:
            break
        index += 1
    return tree, index

Test cases:

lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]

test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
    {
        "name": "a",
        "children": [
            {"name": "1"},
            {"name": "2"},
            {"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
            {"name": "4"},
            {"name": "5"},
        ],
    }
]
assert test_3[0] == expected_3, test_3

Note the output is wrapped in a list in case your source file has more than one root node (i.e. more than one line with no leading tabs), and also for neatness of the recursion.

edited Apr 1, 2019 at 6:46

answered Mar 31, 2019 at 4:35

Riley John Gibbs

1581 gold badge1 silver badge7 bronze badges

4 Comments

akad3b Over a year ago

thanks for the answer! i tried to run it but i think it fails this case pastebin.com/raw/iT6CPAk3

Riley John Gibbs Over a year ago

Is your parser function working correctly? The way you have it written you use line.split(' '), which looks like it's supposed to split on tabs, but seems in your post to be four spaces instead. The test case works fine if I replace with line.split('\t') in your parser function and pass the result to make_tree.

akad3b Over a year ago

good catch there! so, i am using 4-spaces but for better representing the problem here i chose '\t' instead. quick question: in your representation, what's your input to make_tree? is it:

[('a', 0), ('1', 1), ('2', 1), ('3', 1), ('4', 2), ('5', 2), ('6', 2), ('7', 3), ('8', 3), ('9', 3), ('b', 1), ('c', 1), ('d', 1)]

? because in the output, i don't see b, c, or d

Riley John Gibbs Over a year ago

Ah, there's a bug in there indeed. It's not incrementing the index for the children's children. The fix makes it a bit messier, but we can return a tuple with the tree structure and the lines read, and increment index according to that. Updating answer now.

Collectives™ on Stack Overflow

How can I create a nested dictionary object from tree-like file-directory text-file?

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related