1

Having a bit of trouble here.. I need to take a dataframe

import pandas as pd

region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']

pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})

and convert into a nested dictionary with the following format:

[{name: "thing", children: [{name:"sub_thing",children:[{...}] }]}]

so a list of nested dictionaries where the key value pairs are always name:"", children:[{}], but childless children don't have children in their dict.. so the final desired output would be...

[{"name":"A",
    "children":[{"name":"1","children":[{"name":"a"}]},
                {"name":"2","children":[{"name":"b"},{"name":"c"}]}]
 },
 {"name":"B",
    "children":[{"name":"3","children":[{"name":"d"},{"name":"e"}]},
                {"name":"4","children":[{"name":"f"}]}]
 }
]

Assume a generalized framework where the number of levels can vary.

1 Answer 1

1

I don't think you can do better than looping through the rows of the dataframe. That is, I don't see a way to vectorize this process. Also, if the number of levels can vary within the same dataframe, then the update function should be modified to handle nan entries (e.g. adding and not np.isnan(row[1]) to if len(row) > 1).

That said, I believe that the following script should be satisfactory.

import pandas as pd

region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']

df = pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})
ls = []

def update(row,ls):
    for d in ls:
        if d['name'] == row[0]:
            break
    else:
        ls.append({'name':row[0]})
        d = ls[-1]
    if len(row) > 1:
        if not 'children' in d:
            d['children'] = []
        update(row[1:],d['children'])

for _,r in df.iterrows():
    update(r,ls)

print(ls)

The resulting list ls:

[{'name': 'A',
  'children': [{'name': '1', 'children': [{'name': 'a'}]},
   {'name': '2', 'children': [{'name': 'b'}, {'name': 'c'}]}]},
 {'name': 'B',
  'children': [{'name': '3', 'children': [{'name': 'd'}, {'name': 'e'}]},
   {'name': '4', 'children': [{'name': 'f'}]}]}]

Here's a version where childless children have 'children':[] in their dict, which I find a bit more natural.

import pandas as pd

region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']

df = pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})
ls = []

def update(row,ls):
    if len(row) == 0:
        return
    for d in ls:
        if d['name'] == row[0]:
            break
    else:
        ls.append({'name':row[0], 'children':[]})
        d = ls[-1]
    update(row[1:],d['children'])

for _,r in df.iterrows():
    update(r,ls)

print(ls)

The resulting list ls:

[{'name': 'A',
  'children': [{'name': '1', 'children': [{'name': 'a', 'children': []}]},
   {'name': '2',
    'children': [{'name': 'b', 'children': []},
     {'name': 'c', 'children': []}]}]},
 {'name': 'B',
  'children': [{'name': '3',
    'children': [{'name': 'd', 'children': []},
     {'name': 'e', 'children': []}]},
   {'name': '4', 'children': [{'name': 'f', 'children': []}]}]}]
Sign up to request clarification or add additional context in comments.

2 Comments

bless you! worked like a charm. thankfully I don't need to parse this on the fly, so vectorization is not a priority at all.
@Zach Glad to hear it! If that's everything you were looking for, I'd appreciate it if you would "accept" the answer by clicking the check mark () underneath the vote arrows on my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.