1

I have JSON file that has many nested dictionaries/lists of excess information that I do not want to use when creating my data frame. All the unnecessary fluff I have either deleted or replaced with '---'.

{'ID': 1,
 'SPEC': {'Name': 'STOCK_VAL',
  '---': '---',
  '---': '---',
  'Info': {'---': [{'---': '---', '---': '---', '---': '---'}],
   '---': [{'---': '---', '---': '---', '---': '---'}]},
  '---': '---',
  'RELEVANT_AFTER_ALL': [{'---': '---',
    'Max': 140.00,
    'Min': 100.00,
    '---': '---',
    'Name': 'Calculated',
    'Units': 'USD/D',
    '---': '---',
    'Entries': [{'Timestamp': '2022-03-16T23:00:00Z', 'Value': 100.00},
     {'Timestamp': '2022-03-17T23:00:00Z', 'Value': 120.00},
     {'Timestamp': '2022-03-18T23:00:00Z', 'Value': 140.00}],
    '---': '---'},
   {'---': '---',
    'Max': 160.00,
    'Min': 80.00,
    '---': '---',
    'Name': 'Realised',
    'Units': 'USD/D',
    '---': '---',
    'Entries': [{'Timestamp': '2022-03-16T23:00:00Z', 'Value': 160.00},
     {'Timestamp': '2022-03-17T23:00:00Z', 'Value': 120.00},
     {'Timestamp': '2022-03-18T23:00:00Z', 'Value': 80.00}],
    '---': '---'}]}}

From the data above I want to create the following data frame:

Timestamp STOCK_VAL Calculated STOCK_VAL Realised
2022-03-16T23:00:00Z 100.00 160.00
2022-03-17T23:00:00Z 120.00 120.00
2022-03-18T23:00:00Z 140.00 80.00

I have tried using pandas.json_normalize() but failed to extract the table as I want it to be made in an efficient manner.

Thanks in advance for anyone who knows better!

3
  • 1
    It looks the JSON data you shared is not a valid data. Could you please verify the data you paste here is valid JSON? You can use following website: codebeautify.org/jsonviewer Commented Mar 29, 2022 at 8:26
  • You are correct. This is already a formatted JSON extract. I'll check how to get raw JSON then. Commented Mar 29, 2022 at 8:33
  • 1
    The unfortunately named json_normalize does not, in fact, take JSON, but "unserialized JSON objects", so posting a Python structure instead of JSON is not an issue here. The biggest problems with the data you posted are that it is not complete — it is missing ]}} at the end — and the fact that you anonymised one of the keys that is necessary to access the data you want. Commented Mar 29, 2022 at 8:38

1 Answer 1

1

One of the strings you replaced with '---' is relevant after all.

First we find the array where the data is located. Each item of this array should be a series, from which we can build a dataframe.

import pandas as pd
table_data = data['SPEC']['RELEVANT_AFTER_ALL']
x = pd.DataFrame({
    f"STOCK_VAL {item['Name']}": pd.DataFrame(item['Entries']).set_index('Timestamp').squeeze()
    for item in table_data
})

EDIT: Replaced pd.json_normalize with pd.DataFrame, which suffices in this scenario.

EDIT 2: Added STOCK_VAL to the column names.

Sign up to request clarification or add additional context in comments.

2 Comments

You are correct. I will adjust where the missing link was that I foolishly dropped. Do you know by any chance how I could also get the STOCK_VAL in front of the available entries in the naming?
:) Edited, please check.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.