I have an complex/nested JSON, that i need to transform into DataFrame (Python). I could get the first part, but i'm struggling to solve the second part.
import requests
from pandas.io.json import json_normalize
import json
url = 'url'
headers = {'api-key':'key'}
resp = requests.get(url, headers = headers)
print(resp.status_code)
r = resp.content
r
responses = json.loads(r.decode('utf-8'))
responses
Output (responses)
{'count': 855,
'requestAt': '2020-07-15T13:13:26.646+00:00',
'data': {'00b3dc3a-b71e-4547-8910-44691a09cd53': {'registerId': '00b3dc3a-b71e-4547-8910-44691a09cd53',
'count': 10,
'milho_germoplasma': {'feedbackScore': 'good',
'firstVisitAt': '2020-06-11T11:10:42.929-03:00',
'lastVisitAt': '2020-06-15T15:36:43.027-03:00',
'videosCompletedAt': '2020-06-11T11:19:58.753-03:00',
'videosState': [{'completedAt': '2020-06-11T11:19:58.753-03:00',
'completedCount': 1,
'duration': 544.811,
'firstPlayAt': '2020-06-11T11:10:50.170-03:00',
'percent': 0.281,
'playCount': 3,
'seconds': 152.85,
'updatedAt': '2020-06-15T15:38:13.711-03:00',
'videoSrc': 'https://vimeo.com/420453289/b7c455699a'}],
'visitsCount': 3,
'stationId': 'milho_germoplasma'},
'milho_plantio': {'feedbackScore': 'good',
'firstVisitAt': '2020-06-11T10:37:42.509-03:00',
'lastVisitAt': '2020-06-11T12:28:21.105-03:00',
'videosCompletedAt': '2020-06-11T10:49:43.082-03:00',
'videosState': [{'completedAt': '2020-06-11T10:49:43.082-03:00',
'completedCount': 1,
'duration': 700.459,
'firstPlayAt': '2020-06-11T10:37:50.465-03:00',
'percent': 0.042,
'playCount': 2,
'seconds': 29.18,
'updatedAt': '2020-06-11T10:50:18.717-03:00',
'videoSrc': 'https://player.vimeo.com/video/412760474'}],
'visitsCount': 2,
'stationId': 'milho_plantio'}}}}
I tried to use an adaptation of some responses on StackOverflow, but i could solve just part of it without error:
response_list = []
for id in responses['data']:
# get the keys of interest
data = {k: v for k, v in responses['data'][id].items() if k in ['registerId', 'count']}
response_list.append({**data})
print(pd.DataFrame(response_list))
Output:
+--------------------------------------+-------+
| registerId | count |
+--------------------------------------+-------+
| 00b3dc3a-b71e-4547-8910-44691a09cd53 | 10 |
+--------------------------------------+-------+
I need to get inside the next level of this json and turn it into a DataFrame: (each milho_germoplasma/milho_plantio/whatever create a new row for the same registerId with the data inside)
Expected Output:
+--------------------------------------+-------+---------------+-------------------------------+----------------------------------+-------------------+
| registerId | count | feedbackScore | firstVisitAt | lastVisitAt | …(last column) |
+--------------------------------------+-------+---------------+-------------------------------+----------------------------------+-------------------+
| 00b3dc3a-b71e-4547-8910-44691a09cd53 | 10 | good | 2020-06-11T11:10:42.929-03:00 | '2020-06-15T15:36:43.027-03:00', | milho_germoplasma |
| 00b3dc3a-b71e-4547-8910-44691a09cd53 | 10 | good | 2020-06-11T10:37:42.509-03:00 | 2020-06-11T12:28:21.105-03:00 | milho_plantio |
+--------------------------------------+-------+---------------+-------------------------------+----------------------------------+-------------------+