33

I am running Python 3.6 and Pandas 0.19.2 and have a DataFrame which looks as follows:

Name      Chain        Food       Healthy  

George    McDonalds    burger     False
George    KFC          chicken    False
John      Wendys       burger     False
John      McDonalds    salad      True

I want to transform this dataframe into a dict which looks as follows:

health_data = {'George': {'McDonalds': {'Food': 'burger', 'Healthy':False},
                          'KFC':       {'Food': 'chicken', 'Healthy':False}},
               'John':   {'Wendys':    {'Food': 'burger', 'Healthy':False},
                          'McDonalds': {'Food': 'salad', 'Healthy': True}}}

My thoughts so far are:

  1. Use df.groupby to group the names column
  2. Use df.to_dict() to transform the dataframe into a dictionary along the lines of: health_data = input_data.set_index('Chain').T.to_dict()

Thoughts? Thanks up front for the help.

3 Answers 3

39

I think you were very close.

Use groupby and to_dict:

df = df.groupby('Name')[['Chain','Food','Healthy']]
       .apply(lambda x: x.set_index('Chain').to_dict(orient='index'))
       .to_dict()

print (df)
{'George': {'KFC': {'Healthy': False, 'Food': 'chicken'}, 
           'McDonalds': {'Healthy': False, 'Food': 'burger'}}, 
'John': {'McDonalds': {'Healthy': True, 'Food': 'salad'},
         'Wendys': {'Healthy': False, 'Food': 'burger'}}}
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks so much! This worked perfectly. One small question: what does the [['Chain','Food','Healthy']] part of the answer do?
It is filter columns, but if no other columns in df, it can be simplified like df.groupby('Name').apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()
I was trying to do this for so long, didn't think to put the .to_dict inside the lambda, thanks as always Jozi :)
What if I want to set multiple index in apply? Do you have any idea?
@jezrael How would you adjust this if there was another row that had "George, McDonalds, chicken, False"? In other words, there would be duplicate index values because "McDonalds" occurs twice for "George" so it would throw a ValueError since index must be unique for 'orient='index''.
|
13

Solution using dictionary comprehension and groupby:

{n: grp.loc[n].to_dict('index')
 for n, grp in df.set_index(['Name', 'Chain']).groupby(level='Name')}

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}

Solution using defaultdict:

from collections import defaultdict

d = defaultdict(dict)

for i, row in df.iterrows():
    d[row.Name][row.Chain] = row.drop(['Name', 'Chain']).to_dict()

dict(d)

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}

1 Comment

love the use of iterrows and default dict though it is slower than the group by for a little bit. This would allow you to chain multiple loops together. Another way would be to use a multi-index (but not appropriate for this example)
0

jezrael's answer was close to my need, but didn't accommodate non-unique combinations of columns 'Chain', 'Food', and 'Healthy'. So thanks to piRSquared's answer, I was able to create the following solution (which answers @MikeLee's question from the comments of jezrael's answer):

df = pd.DataFrame({
    'Name':     ['George', 'George', 'John', 'John', 'John'],
    'Chain':    ['McDonalds', 'KFC', 'Wendys', 'McDonalds', 'McDonalds'],
    'Food':     ['burger', 'chicken', 'burger', 'salad', 'Frenchies'],
    'Healthy':  [False, False, False, True, False],
})

enter image description here

First, jezrael's answer for comparison with the same input from the OP (notice that I slice off the last row):

df.iloc[0:3].groupby('Name')[['Chain','Food','Healthy']].apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()
{'George': {'McDonalds': {'Food': 'burger', 'Healthy': False},
  'KFC': {'Food': 'chicken', 'Healthy': False}},
 'John': {'Wendys': {'Food': 'burger', 'Healthy': False}}}

Now, the problem if we add the additional row for McDonalds:

df.groupby('Name')[['Chain','Food','Healthy']].apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()
ValueError: DataFrame index must be unique for orient='index'.

And finally my answer inspired by piRSquared:

from collections import defaultdict

def recursive_defaultdict():
    return defaultdict(recursive_defaultdict)

d = recursive_defaultdict()

for _, row in df.iterrows():
    d[row['Name']][row['Chain']][row['Food']] = row.drop(['Name', 'Chain', 'Food']).to_dict()

dict(d)  # which may be undesireable if you wanted both "Food" _and_ "Healthy" as keys under the "Chain" key
{'George': defaultdict(<function __main__._recursive_defaultdict()>,
             {'McDonalds': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'burger': {'Healthy': False}}),
              'KFC': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'chicken': {'Healthy': False}})}),
 'John': defaultdict(<function __main__._recursive_defaultdict()>,
             {'Wendys': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'burger': {'Healthy': False}}),
              'McDonalds': defaultdict(<function __main__._recursive_defaultdict()>,
                          {'salad': {'Healthy': True},
                           'Frenchies': {'Healthy': False}})})}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.