2

I have a tree-structure separated by tabs and lines like this:

a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5

And I am looking to turn this into:

{
'name': 'a',
'children': [
 {'name': '1'},
 {'name': '2'},
 {
   'name': '3'
   'children': [
      {'name': 'b'},
      {'name': 'c'}
    ]
  },
  {'name': '4'},
  {'name': '5'}
  ]
}

for a d3.js collapsible tree data input. I am assuming I have to use recursion in some way but I cannot figure out how.

I have tried turning the input into a list like this:

[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]

using this code:

def parser():
    #run from root `retail-tree`: `python3 src/main.py`
    l, all_line_details = list(), list()
    with open('assets/retail') as f:
        for line in f:
            line = line.rstrip('\n ')
            splitline = line.split('    ') 
            tup = (splitline[-1], len(splitline)-1)
            l.append(splitline)
            all_line_details.append(tup)
            print(tup)
    return all_line_details

Here, the first element is the string itself and the second is the number of tabs there are in that line. Unsure of the recursion step to accomplish this. Appreciate any help!

2 Answers 2

2

You can use a function that uses re.findall with a regex that matches a line as the name of the node, followed by 0 or more lines that start with a tab, grouped as the children, and then recursively builds the same structure for the children after stripping the first tab of each line from the children string:

import re
def parser(s):
    output = []
    for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
        node = {'name': name}
        if children:
            node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
        output.append(node)
    return output

so that given:

s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''

parser(s)[0] returns:

{'name': 'a',
 'children': [{'name': '1'},
              {'name': '2'},
              {'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
              {'name': '4'},
              {'name': '5'}]}
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the answer! is there a way to replace \t with 4-spaces? my data is in spaces and i tried using something like this, where tab_4 = ' ': rx = "^(.*?)\n((?:{0}.*?\n)*)".format(tab_4) for name, children in re.findall(rx, s, re.M | re.S)
Yes, you will simply have to change the line slices for children from line[1:] to line[4:] after replacing \t in the regex with 4 spaces.
0

Working from the list structure you provided from your own parser function:

def make_tree(lines, tab_count=0):
    tree = []
    index = 0
    while index < len(lines):
        if lines[index][1] == tab_count:
            node = {"name": lines[index][0]}
            children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
            if children:
                node["children"] = children
                index += lines_read
            tree.append(node)
        else:
            break
        index += 1
    return tree, index

Test cases:

lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]

test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
    {
        "name": "a",
        "children": [
            {"name": "1"},
            {"name": "2"},
            {"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
            {"name": "4"},
            {"name": "5"},
        ],
    }
]
assert test_3[0] == expected_3, test_3

Note the output is wrapped in a list in case your source file has more than one root node (i.e. more than one line with no leading tabs), and also for neatness of the recursion.

4 Comments

thanks for the answer! i tried to run it but i think it fails this case pastebin.com/raw/iT6CPAk3
Is your parser function working correctly? The way you have it written you use line.split(' '), which looks like it's supposed to split on tabs, but seems in your post to be four spaces instead. The test case works fine if I replace with line.split('\t') in your parser function and pass the result to make_tree.
good catch there! so, i am using 4-spaces but for better representing the problem here i chose '\t' instead. quick question: in your representation, what's your input to make_tree? is it: [('a', 0), ('1', 1), ('2', 1), ('3', 1), ('4', 2), ('5', 2), ('6', 2), ('7', 3), ('8', 3), ('9', 3), ('b', 1), ('c', 1), ('d', 1)]? because in the output, i don't see b, c, or d
Ah, there's a bug in there indeed. It's not incrementing the index for the children's children. The fix makes it a bit messier, but we can return a tuple with the tree structure and the lines read, and increment index according to that. Updating answer now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.