1

I have a pandas dataframe that looks like this

df_in = pd.DataFrame(data = {'another_col': ['a', 'x', '4'], 'json': [
    [{"Key":"firstkey", "Value": 1.4}, {"Key": "secondkey", "Value": 6}],
    [{"Key":"firstkey", "Value": 5.4}, {"Key": "secondkey", "Value": 11}],
    [{"Key":"firstkey", "Value": 1.6}, {"Key": "secondkey", "Value": 9}]]}
)

which when printed looks like

  another_col                                               json
0           a  [{'Key': 'firstkey', 'Value': 1.4}, {'Key': 's...
1           x  [{'Key': 'firstkey', 'Value': 5.4}, {'Key': 's...
2           4  [{'Key': 'firstkey', 'Value': 1.6}, {'Key': 's...

I need to transform it and parse each row of json into columns. I want the resulting dataframe to look like

  another_col  firstkey  secondkey
0           a       1.4          6
1           x       5.4         11
2           4       1.6          9

How do I do this? I have been trying with pd.json_normalize with no success. A secondary concern is speed... I have to apply this on ~5mm rows...but first let's get it working. :-)

1 Answer 1

3

You can convert to dataframe and unstack , then join:

u = df_in['json'].explode()
out = df_in[['another_col']].join(pd.DataFrame(u.tolist(),index=u.index)
                        .set_index('Key',append=True)['Value'].unstack())

print(out)

  another_col  firstkey  secondkey
0           a       1.4        6.0
1           x       5.4       11.0
2           4       1.6        9.0
Sign up to request clarification or add additional context in comments.

2 Comments

Wow! Great. Thank you!
Nice answer @anky. Liked it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.