convert pandas column with dictionary of dictionaries to separate columns

Question

I have pandas df with one column having dictionary of dictionaries: this is what i have:

| ca| cb| 
|:---- |:------:
| 1  | cat1:{paws:1 , hair:2} ,dog1:{paws:3 , hair:5}  
| 2  | cat2:{paws:1 , hair:2} ,dog2:{paws:3 , hair:5}
| 3  | cat3:{paws:1 , hair:2} ,dog3:{paws:3 , hair:5}
| 4  | cat4:{paws:1 , hair:2} ,dog4:{paws:3 , hair:5}

What i want is:

| ca| animal| paws|  hair|
|:----:| -----:| -----:| -----:|
| 1  | cat1  | 1 | 2 
| 1  | dog1  | 3 | 5 
| 2  | cat2  | 1 | 2 
| 2  | dog2 | 3 | 5

What should be fastest way to do this?

Berk Buzcu · Accepted Answer · 2022-06-19 10:07:40Z

1

I found a solution below:

I reproduced a demo version of your dataframe with the following dictionary for the following outcome:

data = {
    "ca": [1, 2],
    "cb": [{"cat1": {"paws": 1, "hair": 2}, "dog1":{"paws":3 , "hair":5}}, 
            {"cat2":{"paws":1 , "hair":2} , "dog2":{"paws":3 , "hair":5}}]
}

df = pandas.DataFrame(data)

df
ca  cb
1   {'cat1': {'paws': 1, 'hair': 2}, 'dog1': {'paw...
2   {'cat2': {'paws': 1, 'hair': 2}, 'dog2': {'paw...

Moving on, I had to eliminate the first step of the dict, ie, extracting the cats and the dogs alike.

first_level = pandas.concat([df.drop(['cb'], axis=1), df['cb'].apply(pandas.Series)], axis=1)
first_level

    ca  cat1    dog1    cat2    dog2
0   1   {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}  NaN NaN
1   2   NaN NaN {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}

The key takeaway here is that you need to apply the melt function to convert the columns to values and set them to respective rows.

first_level.melt(id_vars=["ca"]).dropna()
first_level

    ca  variable    value
0   1   cat1    {'paws': 1, 'hair': 2}
2   1   dog1    {'paws': 3, 'hair': 5}
5   2   cat2    {'paws': 1, 'hair': 2}
7   2   dog2    {'paws': 3, 'hair': 5}

And then the rest is quite simple, with the same apply function, I can convert this dictionary to columns too and the issue is resolved:

second_level = pandas.concat([first_level.drop(['value'], axis=1), first_level['value'].apply(pandas.Series)], axis=1)
second_level

    ca  variable    paws    hair
0   1   cat1    1   2
2   1   dog1    3   5
5   2   cat2    1   2
7   2   dog2    3   5

answered Jun 19, 2022 at 10:07

Berk Buzcu

413 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

glitch_123 Over a year ago

your answer is pretty correct but the problem is that my df has multiple column in place of "ca" . so the melt function needs to be changed. I have tried giving id_vars as multiple column but it does not work. What do you suggest

Berk Buzcu Over a year ago

Strange as I experimented with this too right now and adding multiple columns and adding them to id_vars seems to have worked. For instance: id_vars=["ca", "cc", "cd"] would maintain these columns in my work environment. What could go wrong is if those columns do not correspond to certain elements, dropna may be dropping those values. If you can tell me what the error is, I can offer more help.

Collectives™ on Stack Overflow

convert pandas column with dictionary of dictionaries to separate columns

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related