1

I'm processing data set which originated from JSON so that I get the following resulting format. Below is my expectation:

              category                 type value  value_type  mandatory
0  business-activities  activities-hot-work  true     boolean       true
1   business-employees  employees-full-time     6     integer       true

What I'm getting is the following from df:

              category                 type value                                       validation
0  business-activities  activities-hot-work  true  [{'value_type': 'boolean'}, {'mandatory': True}]
1   business-employees  employees-full-time     6  [{'value_type': 'integer'}, {'mandatory': True}]

Here's my script

import json
import pandas as pd

# broker data received
with open(r'C:\Users\mattl\OneDrive\Everything\Documents\Data files for Python 
Dev\characteristic_values.json') as data_in:
    data = json.load(data_in)
data_inbound = pd.json_normalize(data, 'characteristics',  record_prefix='')

# validation data used to process validation on data received
with open(r'C:\Users\mattl\OneDrive\Everything\Documents\Data files for Python Dev\characteristic_validation.json') as data_val:
data = json.load(data_val)
data_validation = pd.json_normalize(data, 'characteristics',  record_prefix='')

# merge validation with broker data and normalise the data
df = pd.merge(data_inbound, data_validation, on=['category','type'])

print(df)

# Show results
print('Merged and exploded Result')
dfa = df.explode('validation')
print(dfa)

My input files

Characteristic_values.json

{
 "characteristics" :[
     {
         "category" : "business-activities",
         "type" : "activities-hot-work",
         "value" : "true"
     },
     {
         "category" : "business-employees",
         "type" : "employees-full-time",
         "value" : "6"
     }
     ]
}

Characteristic_validation.json

 {
    "characteristics": [{
          "category": "business-activities",
          "type": "activities-hot-work",
          "validation": [{
          "value_type": "boolean"
           }, {
                "mandatory": true
            }]
        },
        {
          "category": "business-employees",
          "type": "employees-full-time",
          "validation": [{
                "value_type": "integer"
             },
             {
                 "mandatory ": true
             }
          ]
       }
    ]
 }

What have I tried already?

characteristics_data = pd.json_normalize(data=df, record_path='validation', meta=['category', 'type', 'value']) I modified one that is working in a tutorial for handling nested JSON but it throws an error which I cannot figure out, but might be on the right track.

Error Messages

  File "C:\Users\mattl\PycharmProjects\jsonValidation\main.py", line 25, in <module>
characteristics_data = pd.json_normalize(data=df, record_path='validation',
  File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 504, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
  File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 477, in _recursive_extract
recs = _pull_records(obj, path[0])
  File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 399, in _pull_records
result = _pull_field(js, spec)
  File "C:\Users\mattl\PycharmProjects\jsonValidation\venv\lib\site-packages\pandas\io\json\_normalize.py", line 390, in _pull_field
result = result[spec]
TypeError: string indices must be integers

I hope I have provided enough information to explain my issue - thanks

2
  • you'll note I also tried a dfa = df.explode('validation') but that gives me 4 rows and still the values contained in the objects are unusable. I left it in just in case that direction ends up being better than normalisation Commented Dec 3, 2021 at 3:44
  • I've also tried result = pd.json_normalize(df, 'validation', ['category', 'type', 'value' ['value_type', 'mandatory']]) but errors with TypeError: string indices must be integers again Commented Dec 3, 2021 at 3:54

1 Answer 1

1

With your dataframe df

              category                 type value                                       validation
0  business-activities  activities-hot-work  true  [{'value_type': 'boolean'}, {'mandatory': True}]
1   business-employees  employees-full-time     6  [{'value_type': 'integer'}, {'mandatory': True}]

you could do

df = pd.concat(
    [
        df.drop(columns="validation"),
        pd.DataFrame({**l[0], **l[1]} for l in df.validation)
    ],
    axis=1
)

or a bit more generic

from itertools import chain

df = pd.concat(
    [
        df.drop(columns="validation"),
        pd.DataFrame(dict(chain(*(d.items() for d in l))) for l in df.validation)
    ],
    axis=1
)

Result:

              category                 type value value_type  mandatory
0  business-activities  activities-hot-work  true    boolean       True
1   business-employees  employees-full-time     6    integer       True
Sign up to request clarification or add additional context in comments.

2 Comments

That sounds awesome, ill try it out later this evening and let you know, based on expected rssult, looks perfect, thanks
thank you, works perfectly, I used the more generic approach

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.