Parse a JSON column in a df and extract specific key value

Question

I have a pandas DataFrame containing one column with a nested JSON dict. I want to normalize the JSON column ('media') and extract the value for the key 'url' when it is present. The 'media' json payload has three types of possible media objects all included in the example data set. I need to extract from the 'MessageMediaWebPage' object, only.

The typical error (although there is some variation) after using

# Using JSON normalize function

pd.json_normalize(df['media'], max_level=1)

AttributeError: 'str' object has no attribute 'values'

The full error is listed below

Data and code example in a JSON format. The example data set only consists of three records but is too large to post directly. The link is to my git:

# The json is nested
# The full data set consists of several columns and 40K + records. This is just a small slice.

df = pd.read_json('https://raw.githubusercontent.com/whitedl/telegram/main/df_3.json', dtype = {'telegram_id':object}) 

df.info()

pd.json_normalize(df['media'], max_level=1)

As background I have tried the following solutions:

Ynjxsjmh · Accepted Answer · 2022-03-23 17:50:45Z

1

The problem is that value in media column is string type. You can apply ast.literal_eval to media column to convert it value to python dict.

import ast 


pd.json_normalize(df['media'].apply(ast.literal_eval), max_level=1)

answered Mar 23, 2022 at 17:50

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Larry Over a year ago

That does it thanks.

Collectives™ on Stack Overflow

Parse a JSON column in a df and extract specific key value

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related