How to convert pandas dataframe to nested dictionary

Question

I am running Python 3.6 and Pandas 0.19.2 and have a DataFrame which looks as follows:

Name      Chain        Food       Healthy  

George    McDonalds    burger     False
George    KFC          chicken    False
John      Wendys       burger     False
John      McDonalds    salad      True

I want to transform this dataframe into a dict which looks as follows:

health_data = {'George': {'McDonalds': {'Food': 'burger', 'Healthy':False},
                          'KFC':       {'Food': 'chicken', 'Healthy':False}},
               'John':   {'Wendys':    {'Food': 'burger', 'Healthy':False},
                          'McDonalds': {'Food': 'salad', 'Healthy': True}}}

My thoughts so far are:

Use df.groupby to group the names column
Use df.to_dict() to transform the dataframe into a dictionary along the lines of: health_data = input_data.set_index('Chain').T.to_dict()

Thoughts? Thanks up front for the help.

jezrael · Accepted Answer · 2022-05-26 16:56:32Z

39

I think you were very close.

Use groupby and to_dict:

df = df.groupby('Name')[['Chain','Food','Healthy']]
       .apply(lambda x: x.set_index('Chain').to_dict(orient='index'))
       .to_dict()

print (df)
{'George': {'KFC': {'Healthy': False, 'Food': 'chicken'}, 
           'McDonalds': {'Healthy': False, 'Food': 'burger'}}, 
'John': {'McDonalds': {'Healthy': True, 'Food': 'salad'},
         'Wendys': {'Healthy': False, 'Food': 'burger'}}}

edited May 26, 2022 at 16:56

user17242583

answered Feb 2, 2017 at 9:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Martin Reindl Over a year ago

Thanks so much! This worked perfectly. One small question: what does the [['Chain','Food','Healthy']] part of the answer do?

jezrael Over a year ago

It is filter columns, but if no other columns in df, it can be simplified like df.groupby('Name').apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()

Umar.H Over a year ago

I was trying to do this for so long, didn't think to put the .to_dict inside the lambda, thanks as always Jozi :)

E. Zeytinci Over a year ago

What if I want to set multiple index in apply? Do you have any idea?

Mike Lee Over a year ago

@jezrael How would you adjust this if there was another row that had "George, McDonalds, chicken, False"? In other words, there would be duplicate index values because "McDonalds" occurs twice for "George" so it would throw a ValueError since index must be unique for 'orient='index''.

|

tommy.carstensen · Accepted Answer · 2020-08-10 02:16:38Z

13

Solution using dictionary comprehension and groupby:

{n: grp.loc[n].to_dict('index')
 for n, grp in df.set_index(['Name', 'Chain']).groupby(level='Name')}

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}

Solution using defaultdict:

from collections import defaultdict

d = defaultdict(dict)

for i, row in df.iterrows():
    d[row.Name][row.Chain] = row.drop(['Name', 'Chain']).to_dict()

dict(d)

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}

edited Aug 10, 2020 at 2:16

tommy.carstensen

9,66215 gold badges70 silver badges112 bronze badges

answered Feb 2, 2017 at 9:40

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

Jon Over a year ago

love the use of iterrows and default dict though it is slower than the group by for a little bit. This would allow you to chain multiple loops together. Another way would be to use a multi-index (but not appropriate for this example)

Weston A. Greene · Accepted Answer · 2024-09-12 17:56:38Z

jezrael's answer was close to my need, but didn't accommodate non-unique combinations of columns 'Chain', 'Food', and 'Healthy'. So thanks to piRSquared's answer, I was able to create the following solution (which answers @MikeLee's question from the comments of jezrael's answer):

df = pd.DataFrame({
    'Name':     ['George', 'George', 'John', 'John', 'John'],
    'Chain':    ['McDonalds', 'KFC', 'Wendys', 'McDonalds', 'McDonalds'],
    'Food':     ['burger', 'chicken', 'burger', 'salad', 'Frenchies'],
    'Healthy':  [False, False, False, True, False],
})

First, jezrael's answer for comparison with the same input from the OP (notice that I slice off the last row):

df.iloc[0:3].groupby('Name')[['Chain','Food','Healthy']].apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()

{'George': {'McDonalds': {'Food': 'burger', 'Healthy': False},
  'KFC': {'Food': 'chicken', 'Healthy': False}},
 'John': {'Wendys': {'Food': 'burger', 'Healthy': False}}}

Now, the problem if we add the additional row for McDonalds:

df.groupby('Name')[['Chain','Food','Healthy']].apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()

ValueError: DataFrame index must be unique for orient='index'.

And finally my answer inspired by piRSquared:

from collections import defaultdict

def recursive_defaultdict():
    return defaultdict(recursive_defaultdict)

d = recursive_defaultdict()

for _, row in df.iterrows():
    d[row['Name']][row['Chain']][row['Food']] = row.drop(['Name', 'Chain', 'Food']).to_dict()

dict(d)  # which may be undesireable if you wanted both "Food" _and_ "Healthy" as keys under the "Chain" key

{'George': defaultdict(<function __main__._recursive_defaultdict()>,
             {'McDonalds': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'burger': {'Healthy': False}}),
              'KFC': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'chicken': {'Healthy': False}})}),
 'John': defaultdict(<function __main__._recursive_defaultdict()>,
             {'Wendys': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'burger': {'Healthy': False}}),
              'McDonalds': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'salad': {'Healthy': True},
                           'Frenchies': {'Healthy': False}})})}

Collectives™ on Stack Overflow

How to convert pandas dataframe to nested dictionary

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related