0

I have pandas df with one column having dictionary of dictionaries: this is what i have:

| ca| cb| 
|:---- |:------:
| 1  | cat1:{paws:1 , hair:2} ,dog1:{paws:3 , hair:5}  
| 2  | cat2:{paws:1 , hair:2} ,dog2:{paws:3 , hair:5}
| 3  | cat3:{paws:1 , hair:2} ,dog3:{paws:3 , hair:5}
| 4  | cat4:{paws:1 , hair:2} ,dog4:{paws:3 , hair:5}

What i want is:

| ca| animal| paws|  hair|
|:----:| -----:| -----:| -----:|
| 1  | cat1  | 1 | 2 
| 1  | dog1  | 3 | 5 
| 2  | cat2  | 1 | 2 
| 2  | dog2 | 3 | 5 

What should be fastest way to do this?

1 Answer 1

1

I found a solution below:

I reproduced a demo version of your dataframe with the following dictionary for the following outcome:

data = {
    "ca": [1, 2],
    "cb": [{"cat1": {"paws": 1, "hair": 2}, "dog1":{"paws":3 , "hair":5}}, 
            {"cat2":{"paws":1 , "hair":2} , "dog2":{"paws":3 , "hair":5}}]
}

df = pandas.DataFrame(data)

df
ca  cb
1   {'cat1': {'paws': 1, 'hair': 2}, 'dog1': {'paw...
2   {'cat2': {'paws': 1, 'hair': 2}, 'dog2': {'paw...

Moving on, I had to eliminate the first step of the dict, ie, extracting the cats and the dogs alike.

first_level = pandas.concat([df.drop(['cb'], axis=1), df['cb'].apply(pandas.Series)], axis=1)
first_level

    ca  cat1    dog1    cat2    dog2
0   1   {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}  NaN NaN
1   2   NaN NaN {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}

The key takeaway here is that you need to apply the melt function to convert the columns to values and set them to respective rows.

first_level.melt(id_vars=["ca"]).dropna()
first_level

    ca  variable    value
0   1   cat1    {'paws': 1, 'hair': 2}
2   1   dog1    {'paws': 3, 'hair': 5}
5   2   cat2    {'paws': 1, 'hair': 2}
7   2   dog2    {'paws': 3, 'hair': 5}

And then the rest is quite simple, with the same apply function, I can convert this dictionary to columns too and the issue is resolved:

second_level = pandas.concat([first_level.drop(['value'], axis=1), first_level['value'].apply(pandas.Series)], axis=1)
second_level

    ca  variable    paws    hair
0   1   cat1    1   2
2   1   dog1    3   5
5   2   cat2    1   2
7   2   dog2    3   5


Sign up to request clarification or add additional context in comments.

2 Comments

your answer is pretty correct but the problem is that my df has multiple column in place of "ca" . so the melt function needs to be changed. I have tried giving id_vars as multiple column but it does not work. What do you suggest
Strange as I experimented with this too right now and adding multiple columns and adding them to id_vars seems to have worked. For instance: id_vars=["ca", "cc", "cd"] would maintain these columns in my work environment. What could go wrong is if those columns do not correspond to certain elements, dropna may be dropping those values. If you can tell me what the error is, I can offer more help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.