I'm processing data set which originated from JSON so that I get the following resulting format. Below is my expectation:
category type value value_type mandatory
0 business-activities activities-hot-work true boolean true
1 business-employees employees-full-time 6 integer true
What I'm getting is the following from df:
category type value validation
0 business-activities activities-hot-work true [{'value_type': 'boolean'}, {'mandatory': True}]
1 business-employees employees-full-time 6 [{'value_type': 'integer'}, {'mandatory': True}]
Here's my script
import json
import pandas as pd
# broker data received
with open(r'C:\Users\mattl\OneDrive\Everything\Documents\Data files for Python
Dev\characteristic_values.json') as data_in:
data = json.load(data_in)
data_inbound = pd.json_normalize(data, 'characteristics', record_prefix='')
# validation data used to process validation on data received
with open(r'C:\Users\mattl\OneDrive\Everything\Documents\Data files for Python Dev\characteristic_validation.json') as data_val:
data = json.load(data_val)
data_validation = pd.json_normalize(data, 'characteristics', record_prefix='')
# merge validation with broker data and normalise the data
df = pd.merge(data_inbound, data_validation, on=['category','type'])
print(df)
# Show results
print('Merged and exploded Result')
dfa = df.explode('validation')
print(dfa)
My input files
Characteristic_values.json
{
"characteristics" :[
{
"category" : "business-activities",
"type" : "activities-hot-work",
"value" : "true"
},
{
"category" : "business-employees",
"type" : "employees-full-time",
"value" : "6"
}
]
}
Characteristic_validation.json
{
"characteristics": [{
"category": "business-activities",
"type": "activities-hot-work",
"validation": [{
"value_type": "boolean"
}, {
"mandatory": true
}]
},
{
"category": "business-employees",
"type": "employees-full-time",
"validation": [{
"value_type": "integer"
},
{
"mandatory ": true
}
]
}
]
}
What have I tried already?
characteristics_data = pd.json_normalize(data=df, record_path='validation', meta=['category', 'type', 'value'])
I modified one that is working in a tutorial for handling nested JSON but it throws an error which I cannot figure out, but might be on the right track.
Error Messages
File "C:\Users\mattl\PycharmProjects\jsonValidation\main.py", line 25, in <module>
characteristics_data = pd.json_normalize(data=df, record_path='validation',
File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 504, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 477, in _recursive_extract
recs = _pull_records(obj, path[0])
File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 399, in _pull_records
result = _pull_field(js, spec)
File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 390, in _pull_field
result = result[spec]
TypeError: string indices must be integers
I hope I have provided enough information to explain my issue - thanks
dfa = df.explode('validation')but that gives me 4 rows and still the values contained in the objects are unusable. I left it in just in case that direction ends up being better than normalisationresult = pd.json_normalize(df, 'validation', ['category', 'type', 'value' ['value_type', 'mandatory']])but errors withTypeError: string indices must be integersagain