-1

I have following dict which I want to convert into pandas. this dict have nested list which can appear for one node but not other.

dis={"companies": [{"object_id": 123,
                           "name": "Abd ",
                           "contact_name": ["xxxx",
                                                                       "yyyy"],
                           "contact_id":[1234,
                                                                     33455]
                           },
                          {"object_id": 654,
                           "name": "DDSPP"},
                          {"object_id": 987,
                           "name": "CCD"}
                          ]}

AS

object_id, name, contact_name, contact_id
123,Abd,xxxx,1234
123,Abd,yyyy,
654,DDSPP,,
987,CCD,,

How can i achive this

I was trying to do like

abc = pd.DataFrame(dis).set_index['object_id','contact_name']

but it says

'method' object is not subscriptable

2
  • 1
    What have you tried so far? Commented Aug 24, 2018 at 9:13
  • Please check the answers provided. Could help. Commented Aug 27, 2018 at 16:35

2 Answers 2

0

This is inspired from @jezrael answer in this link: Splitting multiple columns into rows in pandas dataframe

Use:

s = {"companies": [{"object_id": 123,
                       "name": "Abd ",
                       "contact_name": ["xxxx",
                                                                   "yyyy"],
                       "contact_id":[1234,
                                                                 33455]
                       },
                      {"object_id": 654,
                       "name": "DDSPP"},
                      {"object_id": 987,
                       "name": "CCD"}
                      ]}
df = pd.DataFrame(s) #convert into DF
df = df['companies'].apply(pd.Series) #this splits the internal keys and values into columns
split1 = df.apply(lambda x: pd.Series(x['contact_id']), axis=1).stack().reset_index(level=1, drop=True)
split2 = df.apply(lambda x: pd.Series(x['contact_name']), axis=1).stack().reset_index(level=1, drop=True)
df1 = pd.concat([split1,split2], axis=1, keys=['contact_id','contact_name'])
pd.options.display.float_format = '{:.0f}'.format
print (df.drop(['contact_id','contact_name'], axis=1).join(df1).reset_index(drop=True))

Output with regular index:

    name  object_id  contact_id contact_name
0   Abd         123        1234         xxxx
1   Abd         123       33455         yyyy
2   DDSPP       654         nan          NaN
3   CCD         987         nan          NaN

Is this something you were looking for?

Sign up to request clarification or add additional context in comments.

Comments

0

If you have just only one column needs to convert, then you can use something more shortly, like this:

df = pd.DataFrame(d['companies'])
d = df.loc[0].apply(pd.Series)
d[1].fillna(d[0], inplace=True)
df.drop([0],0).append(d.T)

Otherwise, if you need to do this with more then one raw, you can use it, but it have to be modified.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.