1

I have a DataFrame which contains a certain column with Dictionaries.

I want to add a new header in the DataFrame for each key found on each element in the column that contains dicts, each new value assigned to those new cells should correspond to None if that element doesn't contain that header key and the respective key value otherwise.

Here's the data for testing and visualizing what I'm saying:

Importing dependencies:

import pandas as pd
import numpy as np

Creating a dictionary that contains a inner dictionary list:

data = {'string_info': ['User1', 'User2', 'User3'],
        'dict_info': [{'elm1': 'attr5', 'elm2': 'attr9', 'elm3': 'attr33'},
                 {'elm5': 'attr31', 'elm7': 'attr13'},
                 {'elm5': 'attr28', 'elm1': 'attr23', 'elm2': 'attr33','elm6': 'attr33'}],
        'int_info': [4, 24, 31],}

Creating an appropriate initial DataFrame for testing:

df = pd.DataFrame.from_dict(data)
df

Manually stating what I want as output:

data2 = {'string_info': ['User1', 'User2', 'User3'],
        'elm1': ['attr5',None,'attr23'],
        'elm2': ['attr9',None,'attr33'],
        'elm3': ['attr33',None,None],
        'elm4': [None,None,None],
        'elm5': [None,'attr31',None],
        'elm6': [None,None,'attr33'],
        'elm7': [None,None,'attr13'],
        'int_info': [4, 24, 31]}

The desired output would be:

df2 = pd.DataFrame.from_dict(data2)
df2

Thanks!

1 Answer 1

1

You can use concat with DataFrame constructor for replace dict to columns:

print (pd.DataFrame(df.dict_info.values.tolist()))
     elm1    elm2    elm3    elm5    elm6    elm7
0   attr5   attr9  attr33     NaN     NaN     NaN
1     NaN     NaN     NaN  attr31     NaN  attr13
2  attr23  attr33     NaN  attr28  attr33     NaN

print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()),
                  df[['int_info','string_info']]], axis=1))
     elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
0   attr5   attr9  attr33     NaN     NaN     NaN         4       User1
1     NaN     NaN     NaN  attr31     NaN  attr13        24       User2
2  attr23  attr33     NaN  attr28  attr33     NaN        31       User3

And if need Nones add replace:

print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()).replace({np.nan:None}), 
                  df[['int_info','string_info']]], axis=1))
     elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
0   attr5   attr9  attr33    None    None    None         4       User1
1    None    None    None  attr31    None  attr13        24       User2
2  attr23  attr33    None  attr28  attr33    None        31       User3
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, it worked! I'm definitely checking out more about pd.concat, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.