New to Python and Pandas, working on getting the hang of jsons. Any help appreciated.
Via an API I'm pulling a nested json. The structure of the json is below. The fields I'm after are in view, labeled user_id and message, and then under the nested field replies the subfields user_id and message. The desired fields are labeled below with <<<
],
"view": [
{
"id": 109205,
"user_id": 6354, # <<<< this field
"parent_id": null,
"created_at": "2020-11-03T23:32:49Z",
"updated_at": "2020-11-03T23:32:49Z",
"rating_count": null,
"rating_sum": null,
"message": "message text1", # <<< this field
"replies": [
{
"id": 109298,
"user_id": 5457, # <<< this field
"parent_id": 109205,
"created_at": "2020-11-04T19:42:59Z",
"updated_at": "2020-11-04T19:42:59Z",
"rating_count": null,
"rating_sum": null,
"message": "message text2" # <<< this field
},
{
#json continues
I can successfully pull the top level fields under view, but I'm having difficulty flattening the nested json field replies with json_normalize. Here's my working code:
import pandas as pd
d = r.json() # json pulled from API
df = pd.json_normalize(d['view'], record_path=['replies'])
print(df)
Which results in the following KeyError:
Traceback (most recent call last):
File "C:\Users\danie\AppData\Local\Temp\atom_script_tempfiles\2021720-13268-1xuqx61.3oh2g", line 53, in <module>
df = pd.json_normalize(d['view'], record_path=['replies'])
File "C:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\json\_normalize.py", line 336, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "C:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\json\_normalize.py", line 309, in _recursive_extract
recs = _pull_records(obj, path[0])
File "C:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\json\_normalize.py", line 248, in _pull_records
result = _pull_field(js, spec)
File "C:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\json\_normalize.py", line 239, in _pull_field
result = result[spec]
KeyError: 'replies'
What am I missing here? All suggestions welcome and appreciated.