0

I am trying to encode json into csv in python with pandas, which is supposed to be easy, but the output isn't close to right. Example json

{'energy': {'timeUnit': 'DAY', 'unit': 'Wh', 'measuredBy': 'INVERTER', 'values': [{'date': '2022-01-01 00:00:00', 'value': 322.0}, {'date': '2022-01-02 00:00:00', 'value': 12.0}, {'date': '2022-01-03 00:00:00', 'value': 0.0}]}}

With the following code:

data = r.json()

print(data)

json_object = json.dumps(data)

json_object

with open(r'\\shared\AppDev\Production\data\solaredge\import.json','w') as n:
 n.write(json_object)

df = pd.read_json(r'\\shared\AppDev\Production\data\solaredge\import.json')
df.to_csv(r'\\shared\AppDev\Production\data\solaredge\import.csv', index = None)

Produces

energy INVERTER DAY Wh [{'date': '2022-01-01 00:00:00', 'value': 322.0}, {'date': '2022-01-02 00:00:00', 'value': 12.0}, {'date': '2022-01-03 00:00:00', 'value': 0.0}]

It appears the inner portion of the json hasn't been parsed at all. I'm wondering if I am missing something obvious, I am considering just stripping most of the content out manually with string functions but that seems like there has to be an easier way.

2
  • Not an answer to your question, but you can reduce the code here by passing the JSON string directly to pandas: df = pd.read_json(data). There is no need to save it to a file first. See pandas.pydata.org/pandas-docs/version/1.1.3/reference/api/… for more details. Commented Jan 19, 2022 at 19:39
  • 1
    CSV and most DataFrames are flat views of data (think only 2D). JSON with nested objects introduces more dimensions. This is why the nested items seem not processed as in a flat representation they cannot be represented. You need to decide how to flatten this hierarchy. Commented Jan 19, 2022 at 19:47

1 Answer 1

1

Here's an example of something that processes the values column.

data = {'energy': {'timeUnit': 'DAY', 'unit': 'Wh', 'measuredBy': 'INVERTER', 'values': [{'date': '2022-01-01 00:00:00', 'value': 322.0}, {'date': '2022-01-02 00:00:00', 'value': 12.0}, {'date': '2022-01-03 00:00:00', 'value': 0.0}]}}
import pandas as pd
energy = pd.DataFrame(data['energy'])
pd.concat((energy.drop('values', axis=1), energy['values'].apply(pd.Series)), axis=1)

I am dropping the first level by making the DataFrame from the 'energy' key within the dictionary. This produces a frame with timeUnit, unit and measuredBy value repeated for each date, value dictionary.

Next, by applying pd.Series we can create a new table with two columns date and value. Finally, drop the old values column and replace it by date and value columns. This is all done by pd.concat, drop and apply(pd.Series)

It should look like below:

    timeUnit    unit    measuredBy  date                    value
0   DAY         Wh      INVERTER    2022-01-01 00:00:00     322.0
1   DAY         Wh      INVERTER    2022-01-02 00:00:00     12.0
2   DAY         Wh      INVERTER    2022-01-03 00:00:00     0.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.