2

How can I convert a Pandas DataFrame to a three level nested dictionary using column names?

The columns are not first three columns and I want it to group by column artist then group by column album, and I need it to be case insensitive, preferably without using defaultdict.

This is a minimal reproducible example:

from collections import defaultdict                                               
from itertools import product                                                     
from pandas import DataFrame                                                      
tree = defaultdict(lambda: defaultdict(dict))                                     
columns = {'a': str(), 'b': str(), 'c': str(), 'd': int(), 'e': int(), 'f': int()}
df = DataFrame(columns, index=[])                                                 
for i, j, k in product('abcd', repeat=3):                                         
    tree[i][j][k] = list(map('abcd'.index, (i, j, k)))                            
    df.loc[len(df)] = [i, j, k, *list(map('abcd'.index, (i, j, k)))]              

How can I get a nested dictionary similar to tree from df?

I am really sorry I can provide any actual examples because they wouldn't be minimal.

I tried to use .groupby() but I only ever saw it being used with one column and I really don't know what to do with the pandas.core.groupby.generic.DataFrameGroupBy object it returns, I just started using it today.


Currently I can do this:

tree1 = dict()                                                                                  
for index, row in df.iterrows():                                                                
    if not tree1.get(row['a'].lower()):                                                         
        tree1[row['a'].lower()] = dict()                                                        
    if not tree1[row['a'].lower()].get(row['b'].lower()):                                       
        tree1[row['a'].lower()][row['b'].lower()] = dict()                                      
    tree1[row['a'].lower()][row['b'].lower()][row['c'].lower()] = [row['d'], row['e'], row['f']]

I actually implemented case insensitive str and dict but for the sake of brevity (they are very long) I wouldn't use it here.

But according to this answer https://stackoverflow.com/a/55557758/16383578 such method is bad, what is a better way?

1 Answer 1

2

I would probably do it like this:

cols = ['a', 'b', 'c']
for col in cols:
    df[col] = df[col].str.casefold()
tree = {}
for (a, b, c), values in (df.set_index(cols).T.to_dict(orient='list')
                            .items()):
    tree.setdefault(a, {}).setdefault(b, {})[c] = values

or

...
for (a, b, c), values in (df.set_index(cols).apply(list, axis=1)
                            .to_dict()).items():
    tree.setdefault(a, {}).setdefault(b, {})[c] = values

This produces the same result (when the first part that casefolds is included)

def to_dict(df):
    return df.set_index(df.columns[0]).iloc[:, 0].to_dict()

df['values'] = df[['d', 'e', 'f']].apply(list, axis=1)
df = df[['a', 'b', 'c', 'values']]
tree = (df.set_index(['a', 'b'])
          .groupby(['a', 'b']).apply(to_dict)
          .reset_index('b')
          .groupby('a').apply(to_dict)
          .to_dict())

but I think it's a bit too convoluted.

Results:

{'a': {'a': {'a': [0, 0, 0], 'b': [0, 0, 1], 'c': [0, 0, 2], 'd': [0, 0, 3]},
       'b': {'a': [0, 1, 0], 'b': [0, 1, 1], 'c': [0, 1, 2], 'd': [0, 1, 3]},
       'c': {'a': [0, 2, 0], 'b': [0, 2, 1], 'c': [0, 2, 2], 'd': [0, 2, 3]},
       'd': {'a': [0, 3, 0], 'b': [0, 3, 1], 'c': [0, 3, 2], 'd': [0, 3, 3]}},
 'b': {'a': {'a': [1, 0, 0], 'b': [1, 0, 1], 'c': [1, 0, 2], 'd': [1, 0, 3]},
       'b': {'a': [1, 1, 0], 'b': [1, 1, 1], 'c': [1, 1, 2], 'd': [1, 1, 3]},
       'c': {'a': [1, 2, 0], 'b': [1, 2, 1], 'c': [1, 2, 2], 'd': [1, 2, 3]},
       'd': {'a': [1, 3, 0], 'b': [1, 3, 1], 'c': [1, 3, 2], 'd': [1, 3, 3]}},
 'c': {'a': {'a': [2, 0, 0], 'b': [2, 0, 1], 'c': [2, 0, 2], 'd': [2, 0, 3]},
       'b': {'a': [2, 1, 0], 'b': [2, 1, 1], 'c': [2, 1, 2], 'd': [2, 1, 3]},
       'c': {'a': [2, 2, 0], 'b': [2, 2, 1], 'c': [2, 2, 2], 'd': [2, 2, 3]},
       'd': {'a': [2, 3, 0], 'b': [2, 3, 1], 'c': [2, 3, 2], 'd': [2, 3, 3]}},
 'd': {'a': {'a': [3, 0, 0], 'b': [3, 0, 1], 'c': [3, 0, 2], 'd': [3, 0, 3]},
       'b': {'a': [3, 1, 0], 'b': [3, 1, 1], 'c': [3, 1, 2], 'd': [3, 1, 3]},
       'c': {'a': [3, 2, 0], 'b': [3, 2, 1], 'c': [3, 2, 2], 'd': [3, 2, 3]},
       'd': {'a': [3, 3, 0], 'b': [3, 3, 1], 'c': [3, 3, 2], 'd': [3, 3, 3]}}}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.