Getting pandas dataframe from list of nested dictionaries

Question

I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.

My list of nested dictionaries is the following:

my_list = [{0: {'a': '23', 'b': '15', 'c': '5', 'd': '-1'}, 
            1: {'a': '5', 'b': '6', 'c': '7', 'd': '9'}, 
            2: {'a': '9', 'b': '15', 'c': '5', 'd': '7'}}, 
           {0: {'a': '5', 'b': '249', 'c': '92', 'd': '-4'}, 
            1: {'a': '51', 'b': '5', 'c': '34', 'd': '1'}, 
            2: {'a': '3', 'b': '8', 'c': '3', 'd': '11'}}]

So each key in the main dictionaries has 3 values.

Putting these into a dataframe using data = pd.DataFrame(my_list) returns something unusable, as each cell has information on a, b, c and d in it.

I want to end up with a dataframe that looks like this:

 name| a  | b  | c | d 
0    | 23 | 15 | 5 | -1 
1    | 5  | 6  | 7 |  9 
2    | 9  | 15 | 5 |  7 
0    | 5  |249 | 92| -4 
1    |51  | 5  | 34|  1 
2    | 3  | 8  | 3 | 11

Is this possible?

DYZ · Accepted Answer · 2017-01-30 23:02:11Z

10

Easy:

pd.concat([pd.DataFrame(l) for l in my_list],axis=1).T

answered Jan 30, 2017 at 23:02

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Igor Raush · Accepted Answer · 2017-01-31 00:41:20Z

4

Another solution:

from itertools import chain
pd.DataFrame.from_items(list(chain.from_iterable(d.iteritems() for d in my_list))).T

In my experiments, this is faster than using pd.concat (especially when the number of "sub-dataframes" is large) at the cost of being more verbose.

edited Jan 31, 2017 at 0:41

answered Jan 30, 2017 at 23:23

Igor Raush

15.3k1 gold badge38 silver badges58 bronze badges

5 Comments

aliki43 Over a year ago

Thank you so much! When I try this code, I get the error: NameError: name 'chain' is not defined. Would you know why? Otherwise, I think I understand the intuition of this code.

Igor Raush Over a year ago

Sorry, forgot to specify the import. I was using itertools.chain, part of the standard library. Please see edit.

aliki43 Over a year ago

Thank you! I'll try both pd.concat and this one, since I do have a lot of data to work with.

aliki43 Over a year ago

Just a quick update: I didn't see any huge difference in terms of time for processing between pd.concat and this method, maybe because my dataset was not that massive (20,000 observations in all). Thank you again!

Igor Raush Over a year ago

I would guess that the number of observations doesn't matter as much as the number of "chunks." There is a fairly big overhead in creating a DataFrame from each chunk and then going through the cumbersome index alignment with pd.concat, but it doesn't matter as much if you only have a few chunks. Anyway, glad you solved your problem.

juanpa.arrivillaga · Accepted Answer · 2021-12-19 20:15:03Z

You can munge the list of dictionaries to be acceptable to a DataFrame constructor:

In [4]: pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
Out[4]:
    a    b   c   d  name
0  23   15   5  -1     0
1   5    6   7   9     1
2   9   15   5   7     2
3   5  249  92  -4     0
4  51    5  34   1     1
5   3    8   3  11     2

In [5]: df = pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])

In [6]: df.set_index('name',inplace=True)

In [7]: df
Out[7]:
       a    b   c   d
name
0     23   15   5  -1
1      5    6   7   9
2      9   15   5   7
0      5  249  92  -4
1     51    5  34   1
2      3    8   3  11

This requires relatively recent versions of Python for {'name':'something', **rest} to work. It is merely a shorthand for the following:

In [13]: reshaped = []
    ...: for d in my_list:
    ...:     for k, v in d.items():
    ...:         new = {'name': k}
    ...:         new.update(v)
    ...:         reshaped.append(new)
    ...:

In [14]: reshaped
Out[14]:
[{'a': '23', 'b': '15', 'c': '5', 'd': '-1', 'name': 0},
 {'a': '5', 'b': '6', 'c': '7', 'd': '9', 'name': 1},
 {'a': '9', 'b': '15', 'c': '5', 'd': '7', 'name': 2},
 {'a': '5', 'b': '249', 'c': '92', 'd': '-4', 'name': 0},
 {'a': '51', 'b': '5', 'c': '34', 'd': '1', 'name': 1},
 {'a': '3', 'b': '8', 'c': '3', 'd': '11', 'name': 2}]

Minh Quan Đức Lương · Accepted Answer · 2021-12-19 20:27:27Z

0

from pandas import DataFrame

def flat_dict(data: dict, prefix=''):
    result = dict()
    
    for key in data:
        
        if len(prefix):
            field = prefix + '_' + key
        else:
            field = key
            
        if isinstance(data[key], dict):
            result.update(
                flat_dict(data[key], key)
            )
        else:
            result[field] = data[key]
    
    return result

refactor_data = map(lambda x: flat_dict(x), data)

df = DataFrame(refactor_data)

edited Dec 19, 2021 at 20:27

user17242583

answered Apr 14, 2021 at 5:31

Minh Quan Đức Lương

3821 gold badge2 silver badges6 bronze badges

Comments

AJ AJ · Accepted Answer · 2022-12-27 14:33:17Z

0

[pd.DataFrame.from_dict(l, orient='index') for l in my_list]

Documentation says that if you want the keys of dictionary to be rows, so use orient='index'.

edited Dec 27, 2022 at 14:33

answered Dec 27, 2022 at 9:16

AJ AJ

3554 silver badges14 bronze badges

Collectives™ on Stack Overflow

Getting pandas dataframe from list of nested dictionaries

5 Answers 5

Comments

5 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related