1

My question is similar to this one - Parsing JSON with number as key and this one Parsing nested JSON except that I want to parse my JSON file with pandas normalize. Eventually I want to write this dataframe into an excel file( and I have the code to do that) . The dataframe I want in memory should look like this

Timestamp  BatteryVoltage GridCurrent GridVoltage InverterCurrent InverterVoltage 

....

....

The current code that I have does not help me at all -

import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize

with open('test.json') as data_file:
     data = json.load(data_file)


  

df = pd.json_normalize(data['timestamp'])

I know I have to give something as argument to json_normalize record_path but I am not sure what it is going to be as the value of timestamp keeps changing.

{"timestamp": {
"1636987025": {
  "batteryVoltage": 28.74732,
  "gridCurrent": 3.68084,
  "gridVoltage": 230.64401,
  "inverterCurrent": 2.00471,
  "inverterVoltage": 224.18573,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987008
},
"1636987085": {
  "batteryVoltage": 28.52959,
  "gridCurrent": 3.40046,
  "gridVoltage": 230.41367,
  "inverterCurrent": 1.76206,
  "inverterVoltage": 225.24319,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987146": {
  "batteryVoltage": 28.5338,
  "gridCurrent": 3.37573,
  "gridVoltage": 229.27209,
  "inverterCurrent": 2.11128,
  "inverterVoltage": 225.51733,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987206": {
  "batteryVoltage": 28.55535,
  "gridCurrent": 3.43365,
  "gridVoltage": 229.47604,
  "inverterCurrent": 1.98594,
  "inverterVoltage": 225.83649,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987264
  }
 }
}

1 Answer 1

1

pandas' json_normalize is designed for json/dicts that lists embedded in them; your data does not have that.

One option is with jmespath:

# pip install jmespath

expression = jmespath.compile("timestamp.*")
pd.DataFrame(expression.search(data)) 
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264

have a look at the docs for jmespath; it is a wonderful tool that can be handy in wrangling json.

Another option, without jmespath and just pure python:

result = [value for _, value in data['timestamp'].items()]
pd.DataFrame(result)
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264

To capture the timestamp is easy:

# for python 3.9, you can use | instead for the dictionary merging
result = [{**value, **{'timestamp':key}} for key, value in data['timestamp'].items()]

pd.DataFrame(result)
   batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue   timestamp
0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008  1636987025
1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136  1636987085
2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136  1636987146
3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264  1636987206

Another option:

pd.DataFrame.from_dict(data['timestamp'], orient='index').rename_axis('timestamp').reset_index()
    timestamp  batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
0  1636987025        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
1  1636987085        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
2  1636987146        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
3  1636987206        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264
Sign up to request clarification or add additional context in comments.

3 Comments

went ahead and accepted your great answer. Thank you very much. One final query. How can I add the timestamp in the jmespath solution as well ?
Ahhh jmespath is limited in that aspect, that's y I didn't add it. You could work around it, but not worth the effort
Sure, share the link to the new question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.