I have JSON file that has many nested dictionaries/lists of excess information that I do not want to use when creating my data frame. All the unnecessary fluff I have either deleted or replaced with '---'.
{'ID': 1,
'SPEC': {'Name': 'STOCK_VAL',
'---': '---',
'---': '---',
'Info': {'---': [{'---': '---', '---': '---', '---': '---'}],
'---': [{'---': '---', '---': '---', '---': '---'}]},
'---': '---',
'RELEVANT_AFTER_ALL': [{'---': '---',
'Max': 140.00,
'Min': 100.00,
'---': '---',
'Name': 'Calculated',
'Units': 'USD/D',
'---': '---',
'Entries': [{'Timestamp': '2022-03-16T23:00:00Z', 'Value': 100.00},
{'Timestamp': '2022-03-17T23:00:00Z', 'Value': 120.00},
{'Timestamp': '2022-03-18T23:00:00Z', 'Value': 140.00}],
'---': '---'},
{'---': '---',
'Max': 160.00,
'Min': 80.00,
'---': '---',
'Name': 'Realised',
'Units': 'USD/D',
'---': '---',
'Entries': [{'Timestamp': '2022-03-16T23:00:00Z', 'Value': 160.00},
{'Timestamp': '2022-03-17T23:00:00Z', 'Value': 120.00},
{'Timestamp': '2022-03-18T23:00:00Z', 'Value': 80.00}],
'---': '---'}]}}
From the data above I want to create the following data frame:
| Timestamp | STOCK_VAL Calculated | STOCK_VAL Realised |
|---|---|---|
| 2022-03-16T23:00:00Z | 100.00 | 160.00 |
| 2022-03-17T23:00:00Z | 120.00 | 120.00 |
| 2022-03-18T23:00:00Z | 140.00 | 80.00 |
I have tried using pandas.json_normalize() but failed to extract the table as I want it to be made in an efficient manner.
Thanks in advance for anyone who knows better!
json_normalizedoes not, in fact, take JSON, but "unserialized JSON objects", so posting a Python structure instead of JSON is not an issue here. The biggest problems with the data you posted are that it is not complete — it is missing]}}at the end — and the fact that you anonymised one of the keys that is necessary to access the data you want.