6

I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.

My list of nested dictionaries is the following:

my_list = [{0: {'a': '23', 'b': '15', 'c': '5', 'd': '-1'}, 
            1: {'a': '5', 'b': '6', 'c': '7', 'd': '9'}, 
            2: {'a': '9', 'b': '15', 'c': '5', 'd': '7'}}, 
           {0: {'a': '5', 'b': '249', 'c': '92', 'd': '-4'}, 
            1: {'a': '51', 'b': '5', 'c': '34', 'd': '1'}, 
            2: {'a': '3', 'b': '8', 'c': '3', 'd': '11'}}]

So each key in the main dictionaries has 3 values.

Putting these into a dataframe using data = pd.DataFrame(my_list) returns something unusable, as each cell has information on a, b, c and d in it.

I want to end up with a dataframe that looks like this:

 name| a  | b  | c | d 
0    | 23 | 15 | 5 | -1 
1    | 5  | 6  | 7 |  9 
2    | 9  | 15 | 5 |  7 
0    | 5  |249 | 92| -4 
1    |51  | 5  | 34|  1 
2    | 3  | 8  | 3 | 11 

Is this possible?

5 Answers 5

10

Easy:

pd.concat([pd.DataFrame(l) for l in my_list],axis=1).T
Sign up to request clarification or add additional context in comments.

Comments

4

Another solution:

from itertools import chain
pd.DataFrame.from_items(list(chain.from_iterable(d.iteritems() for d in my_list))).T

In my experiments, this is faster than using pd.concat (especially when the number of "sub-dataframes" is large) at the cost of being more verbose.

5 Comments

Thank you so much! When I try this code, I get the error: NameError: name 'chain' is not defined. Would you know why? Otherwise, I think I understand the intuition of this code.
Sorry, forgot to specify the import. I was using itertools.chain, part of the standard library. Please see edit.
Thank you! I'll try both pd.concat and this one, since I do have a lot of data to work with.
Just a quick update: I didn't see any huge difference in terms of time for processing between pd.concat and this method, maybe because my dataset was not that massive (20,000 observations in all). Thank you again!
I would guess that the number of observations doesn't matter as much as the number of "chunks." There is a fairly big overhead in creating a DataFrame from each chunk and then going through the cumbersome index alignment with pd.concat, but it doesn't matter as much if you only have a few chunks. Anyway, glad you solved your problem.
1

You can munge the list of dictionaries to be acceptable to a DataFrame constructor:

In [4]: pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
Out[4]:
    a    b   c   d  name
0  23   15   5  -1     0
1   5    6   7   9     1
2   9   15   5   7     2
3   5  249  92  -4     0
4  51    5  34   1     1
5   3    8   3  11     2

In [5]: df = pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])

In [6]: df.set_index('name',inplace=True)

In [7]: df
Out[7]:
       a    b   c   d
name
0     23   15   5  -1
1      5    6   7   9
2      9   15   5   7
0      5  249  92  -4
1     51    5  34   1
2      3    8   3  11

This requires relatively recent versions of Python for {'name':'something', **rest} to work. It is merely a shorthand for the following:

In [13]: reshaped = []
    ...: for d in my_list:
    ...:     for k, v in d.items():
    ...:         new = {'name': k}
    ...:         new.update(v)
    ...:         reshaped.append(new)
    ...:

In [14]: reshaped
Out[14]:
[{'a': '23', 'b': '15', 'c': '5', 'd': '-1', 'name': 0},
 {'a': '5', 'b': '6', 'c': '7', 'd': '9', 'name': 1},
 {'a': '9', 'b': '15', 'c': '5', 'd': '7', 'name': 2},
 {'a': '5', 'b': '249', 'c': '92', 'd': '-4', 'name': 0},
 {'a': '51', 'b': '5', 'c': '34', 'd': '1', 'name': 1},
 {'a': '3', 'b': '8', 'c': '3', 'd': '11', 'name': 2}]

Comments

0
from pandas import DataFrame

def flat_dict(data: dict, prefix=''):
    result = dict()
    
    for key in data:
        
        if len(prefix):
            field = prefix + '_' + key
        else:
            field = key
            
        if isinstance(data[key], dict):
            result.update(
                flat_dict(data[key], key)
            )
        else:
            result[field] = data[key]
    
    return result

refactor_data = map(lambda x: flat_dict(x), data)

df = DataFrame(refactor_data)

Comments

0
[pd.DataFrame.from_dict(l, orient='index') for l in my_list]

Documentation says that if you want the keys of dictionary to be rows, so use orient='index'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.