Normalizing a Nested JSON in Python and Converting it to a Pandas Dataframe

Question

I have created a simpler version of some JSON data I've been working with here:

[
    {
        "id": 1,
        "city": "Philadelphia",
        "Retaillocations": { "subLocation": [
        {
            "address": "1235 Passyunk Ave",
            "district": "South"
        },
        {
            "address": "900 Market St",
            "district": "Center City"
        },
        {
            "address": "2300 Roosevelt Blvd",
            "district": "North"
        }
        ]
    },
        "distributionLocations": {"subLocation": [{
            "address": "3000 Broad St",
            "district": "North"
        },
        {
            "address": "3000 Essington Blvd",
            "district": "Cargo City"
        },
        {
            "address": "4300 City Ave",
            "district": "West"
        }
        ]
    }
        
    }
]

My goal is to normalize this into a data frame (yes, the above json will only create one row, but I am hoping to get the steps down and then generalize it to a larger set).

First, I loaded the file with jsob_obj = json.loads(inputData) which turns this into a dictionary. The problem is that some of the dictionaries can have lists and are nested oddly as shown above. I've tried using pd.json_normalize(json_obj, record_path = 'retailLocations'), I get a type error saying that list indices must be integers or slices, not str. How can I handle the above JSON file and convert it into a single record in a pandas data frame?

This will differ depending on the output you want but easiest solution would be to flatten the dict then just pd.DataFrame() and work on renaming from there — rayad
– rayad, Commented Feb 23, 2023 at 14:55

Jason Baker · Accepted Answer · 2023-02-23 15:58:00Z

Guessing on the desired output, using .json_normalize() to flatten:

retail = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["Retaillocations", "subLocation"]
).assign(source="retail")

distribution = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["distributionLocations", "subLocation"]
).assign(source="distribution")

final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)

Output:

               address     district id          city        source
0    1235 Passyunk Ave        South  1  Philadelphia        retail
1        900 Market St  Center City  1  Philadelphia        retail
2  2300 Roosevelt Blvd        North  1  Philadelphia        retail
3        3000 Broad St        North  1  Philadelphia  distribution
4  3000 Essington Blvd   Cargo City  1  Philadelphia  distribution
5        4300 City Ave         West  1  Philadelphia  distribution

Collectives™ on Stack Overflow

Normalizing a Nested JSON in Python and Converting it to a Pandas Dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related