2

I have a pandas DataFrame containing one column with a nested JSON dict. I want to normalize the JSON column ('media') and extract the value for the key 'url' when it is present. The 'media' json payload has three types of possible media objects all included in the example data set. I need to extract from the 'MessageMediaWebPage' object, only.

The typical error (although there is some variation) after using

# Using JSON normalize function

pd.json_normalize(df['media'], max_level=1)

AttributeError: 'str' object has no attribute 'values'

The full error is listed below

Data and code example in a JSON format. The example data set only consists of three records but is too large to post directly. The link is to my git:

# The json is nested
# The full data set consists of several columns and 40K + records. This is just a small slice.

df = pd.read_json('https://raw.githubusercontent.com/whitedl/telegram/main/df_3.json', dtype = {'telegram_id':object}) 

df.info()

pd.json_normalize(df['media'], max_level=1)

As background I have tried the following solutions:

1 Answer 1

1

The problem is that value in media column is string type. You can apply ast.literal_eval to media column to convert it value to python dict.

import ast 


pd.json_normalize(df['media'].apply(ast.literal_eval), max_level=1)
Sign up to request clarification or add additional context in comments.

1 Comment

That does it thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.