-1

I have created a simpler version of some JSON data I've been working with here:

[
    {
        "id": 1,
        "city": "Philadelphia",
        "Retaillocations": { "subLocation": [
        {
            "address": "1235 Passyunk Ave",
            "district": "South"
        },
        {
            "address": "900 Market St",
            "district": "Center City"
        },
        {
            "address": "2300 Roosevelt Blvd",
            "district": "North"
        }
        ]
    },
        "distributionLocations": {"subLocation": [{
            "address": "3000 Broad St",
            "district": "North"
        },
        {
            "address": "3000 Essington Blvd",
            "district": "Cargo City"
        },
        {
            "address": "4300 City Ave",
            "district": "West"
        }
        ]
    }
        
    }
]

My goal is to normalize this into a data frame (yes, the above json will only create one row, but I am hoping to get the steps down and then generalize it to a larger set).

First, I loaded the file with jsob_obj = json.loads(inputData) which turns this into a dictionary. The problem is that some of the dictionaries can have lists and are nested oddly as shown above. I've tried using pd.json_normalize(json_obj, record_path = 'retailLocations'), I get a type error saying that list indices must be integers or slices, not str. How can I handle the above JSON file and convert it into a single record in a pandas data frame?

1
  • This will differ depending on the output you want but easiest solution would be to flatten the dict then just pd.DataFrame() and work on renaming from there Commented Feb 23, 2023 at 14:55

1 Answer 1

2

Guessing on the desired output, using .json_normalize() to flatten:

retail = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["Retaillocations", "subLocation"]
).assign(source="retail")

distribution = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["distributionLocations", "subLocation"]
).assign(source="distribution")

final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)

Output:

               address     district id          city        source
0    1235 Passyunk Ave        South  1  Philadelphia        retail
1        900 Market St  Center City  1  Philadelphia        retail
2  2300 Roosevelt Blvd        North  1  Philadelphia        retail
3        3000 Broad St        North  1  Philadelphia  distribution
4  3000 Essington Blvd   Cargo City  1  Philadelphia  distribution
5        4300 City Ave         West  1  Philadelphia  distribution
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.