Create a new DataFrame adding each key from a column dict as header

Question

I have a DataFrame which contains a certain column with Dictionaries.

I want to add a new header in the DataFrame for each key found on each element in the column that contains dicts, each new value assigned to those new cells should correspond to None if that element doesn't contain that header key and the respective key value otherwise.

Here's the data for testing and visualizing what I'm saying:

Importing dependencies:

import pandas as pd
import numpy as np

Creating a dictionary that contains a inner dictionary list:

data = {'string_info': ['User1', 'User2', 'User3'],
        'dict_info': [{'elm1': 'attr5', 'elm2': 'attr9', 'elm3': 'attr33'},
                 {'elm5': 'attr31', 'elm7': 'attr13'},
                 {'elm5': 'attr28', 'elm1': 'attr23', 'elm2': 'attr33','elm6': 'attr33'}],
        'int_info': [4, 24, 31],}

Creating an appropriate initial DataFrame for testing:

df = pd.DataFrame.from_dict(data)
df

Manually stating what I want as output:

data2 = {'string_info': ['User1', 'User2', 'User3'],
        'elm1': ['attr5',None,'attr23'],
        'elm2': ['attr9',None,'attr33'],
        'elm3': ['attr33',None,None],
        'elm4': [None,None,None],
        'elm5': [None,'attr31',None],
        'elm6': [None,None,'attr33'],
        'elm7': [None,None,'attr13'],
        'int_info': [4, 24, 31]}

The desired output would be:

df2 = pd.DataFrame.from_dict(data2)
df2

Thanks!

jezrael · Accepted Answer · 2017-03-04 20:28:50Z

1

You can use concat with DataFrame constructor for replace dict to columns:

print (pd.DataFrame(df.dict_info.values.tolist()))
     elm1    elm2    elm3    elm5    elm6    elm7
0   attr5   attr9  attr33     NaN     NaN     NaN
1     NaN     NaN     NaN  attr31     NaN  attr13
2  attr23  attr33     NaN  attr28  attr33     NaN

print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()),
                  df[['int_info','string_info']]], axis=1))
     elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
0   attr5   attr9  attr33     NaN     NaN     NaN         4       User1
1     NaN     NaN     NaN  attr31     NaN  attr13        24       User2
2  attr23  attr33     NaN  attr28  attr33     NaN        31       User3

And if need Nones add replace:

print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()).replace({np.nan:None}), 
                  df[['int_info','string_info']]], axis=1))
     elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
0   attr5   attr9  attr33    None    None    None         4       User1
1    None    None    None  attr31    None  attr13        24       User2
2  attr23  attr33    None  attr28  attr33    None        31       User3

edited Mar 4, 2017 at 20:28

answered Mar 4, 2017 at 20:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

EduGord Over a year ago

Thank you very much, it worked! I'm definitely checking out more about pd.concat, thanks!

Collectives™ on Stack Overflow

Create a new DataFrame adding each key from a column dict as header

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related