1

How can you make a dataframe that has a multi-index and make it into a nice nested dictionary?

Here's what I've tried so far, and it's close, however, the keys are tuples. Looking to break those out into more dictionary keys.

What I've Tried:

that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
    'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
    'Type':['100','4','7','101','100','100','4','7'],
    'time':[np.linspace(0,10,2) for i in range(8)]}

nn = pd.DataFrame(that)
nn = nn.set_index(['Food','Color','Type'])
vv = {}
for idx in nn.index:
   vv[idx] = nn.loc[idx]

vv
Out[1]: 
{('Apple', 'Red', '100'): time    [0.0, 10.0]
 Name: (Apple, Red, 100), dtype: object,
 ('Apple', 'Green', '4'): time    [0.0, 10.0]
 Name: (Apple, Green, 4), dtype: object,
 ('Apple', 'Yellow', '7'): time    [0.0, 10.0]
 Name: (Apple, Yellow, 7), dtype: object,
 ('Apple', 'Red', '101'): time    [0.0, 10.0]
 Name: (Apple, Red, 101), dtype: object,
 ('Banana', 'Red', '100'): time    [0.0, 10.0]
 Name: (Banana, Red, 100), dtype: object,
 ('Banana', 'Green', '100'): time    [0.0, 10.0]
 Name: (Banana, Green, 100), dtype: object,
 ('Orange', 'Green', '4'): time    [0.0, 10.0]
 Name: (Orange, Green, 4), dtype: object,
 ('Orange', 'Yellow', '7'): time    [0.0, 10.0]
 Name: (Orange, Yellow, 7), dtype: object}

What I want the output to look like.

vv = {'Apple':{'Red':{'100':[0,10],'101':[0,10]},
               'Green':{'4':[0,10]},
               'Yellow':{'7':[0,10]}},
      'Banana':{'Red':{'100':[0,10]},
                'Green':{'100':[0,10]}}
      'Orange':{'Green':{'4':[0,10]},
                'Yellow':{'7':[0,10]}}}

Edit: Changed the range back to 8... was a typo, and changed number of points in linspace to just be 2 points for simplicity to reflect the example.

Edit 2: Looking for a general way to do this. In particular, a colleague of mine has written a treeView model in pyqt that accepts a nested dictionary for the tree. I just want to be able to take the dataframes that I have created to be quickly transformed into the format needed.

For those curious on how to do this in general, here you go. Nice little function I wrote. Works more for what I need.

that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
        'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
        'Type':['100','4','7','101','100','100','4','7'],
        'time':[np.linspace(0,10,2) for i in range(8)]}

x = pd.DataFrame(that)

def NestedDict_fromDF(iDF,keyorder,values):
    if not isinstance(keyorder,list):
        keyorder = [keyorder]
    if not isinstance(values,list):
        values = [values]
    for i in reversed(range(len(keyorder))):
        if keyorder[i] not in iDF:
            keyorder.pop(i)
    for i in reversed(range(len(values))):
        if values[i] not in iDF:
            values.pop(i)
    rdict = {}
    if keyorder:
        ndf = iDF.set_index(keyorder)
        def makeDict(basedict,group):
            for k,g in group:
                basedict[k] = {}
                try:
                    makeDict(basedict[k], g.droplevel(0).groupby(level=0))
                except:
                    if values:
                        basedict[k] = g[values].reset_index(drop=True)
                    else:
                        basedict[k] = []
            return basedict
        rdict = makeDict({}, ndf.groupby(level=0))
    return rdict

yy = NestedDict_fromDF(x,['Food','Color','Type','Integer'],['time'])


{'Apple': {'Green': {'4':DataFrame},
           'Red': {'100':DataFrame,
                   '101':DataFrame},
           'Yellow': {'7':DataFrame}},
 'Banana': {'Green': {'100':DataFrame},
            'Red': {'100':DataFrame}},
 'Orange': {'Green': {'4':DataFrame},
            'Yellow': {'7':DataFrame}}}
3
  • the dataframe nn is not reproducible , it throws a ValueError: arrays must all be same length error Commented Feb 8, 2020 at 7:22
  • Change the range(2) to range(8), and for the purpose of example changing linspace to [0.0, 0.10] would be better. Commented Feb 8, 2020 at 7:23
  • Corrected. Sorry, that was a typo Commented Feb 8, 2020 at 7:45

1 Answer 1

1

It grew too complex, too fast:

from pprint import pprint
import pandas as pd

that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
    'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
    'Type':['100','4','7','101','100','100','4','7'],
    'time':[np.linspace(0,10,2) for i in range(8)]}

nn = pd.DataFrame(that)
df = nn.groupby(['Food', 'Color', 'Type']).agg(list)
d = {}
new_df = df.groupby(level=[0,1]).apply(lambda df:df.xs(df.name).to_dict()).to_dict() #[1]
for (food, color), v in new_df.items():
    if not food in d:
        d[food] = {color: {Type: time[0].tolist() for Type, time in v['time'].items()}}
    else:
        d[food][color] = {Type: time[0].tolist() for Type, time in v['time'].items()}
pprint(d)

Output:

{'Apple': {'Green': {'4': [0.0, 10.0]},
           'Red': {'100': [0.0, 10.0], '101': [0.0, 10.0]},
           'Yellow': {'7': [0.0, 10.0]}},
 'Banana': {'Green': {'100': [0.0, 10.0]}, 'Red': {'100': [0.0, 10.0]}},
 'Orange': {'Green': {'4': [0.0, 10.0]}, 'Yellow': {'7': [0.0, 10.0]}}}

[1] taken from: DataFrame with MultiIndex to dict

Huh! got it finally!

that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
    'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
    'Type':['100','4','7','101','100','100','4','7'],
    'time':[np.linspace(0,10,2) for i in range(8)]}

nn = pd.DataFrame(that)
nn = nn.set_index(['Food','Color','Type'])
group = nn.groupby(level=0)
d = {k: g.droplevel(0).groupby(level=0)
       .apply(lambda df:df.xs(df.name)['time']
       .apply(lambda x:x.tolist()).to_dict())
       .to_dict() for k,g in group}
pprint(d)

{'Apple': {'Green': {'4': [0.0, 10.0]},
           'Red': {'100': [0.0, 10.0], '101': [0.0, 10.0]},
           'Yellow': {'7': [0.0, 10.0]}},
 'Banana': {'Green': {'100': [0.0, 10.0]}, 'Red': {'100': [0.0, 10.0]}},
 'Orange': {'Green': {'4': [0.0, 10.0]}, 'Yellow': {'7': [0.0, 10.0]}}}
Sign up to request clarification or add additional context in comments.

3 Comments

I have seen that solution in your [1]. I like your answer, but in actuality I'm looking for a general way to do this. This example was just a boiled down version of what I need.
@Carl I think I got it, please see if it works for you.
@Syandip Dutta Made a function that will do this in general in the last edit, just in case you were curious.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.