1

I have a json response like this

order_response = {
"orders": [
    {
        "id": '1',
        "email": "[email protected]",
        "location_id": 9,
        "line_items": [
            {
                "id": 5,
                "product_id": 6,
            }, {
                "id": 7,
                "product_id": 8,
            }
        ]
    }, {
        "id": '2',
        "email": "[email protected]",
        "location_id": 10,
        "line_items": {
            "id": 3,
            "product_id": 4,
        }
    },
]

}

And I wanted the output like this

id email      location_id line_items_id line_items_product_id
1  [email protected] 9           5             6
1  [email protected] 9           7             8
1  [email protected] 10          3             4

I want to split the rows as per the number of objects in the line_items. So my approach is to use the json_normalize feature of Pandas I am able to spilt if I specify the column names in the code as shown below.

pd.io.json.json_normalize(report_json, ['line_items'], ['id', 'email'], record_prefix='line_items_')

but there may be other columns apart from id, email. I want this to be dynamic i.e. it should be able to do with any number of objects provided without explicitly defining Any help in this regard is highly appreciated.

1 Answer 1

2

First add list to one element dictionaries and also extract all keys of dictionaries:

L = []
keys = []
for x in report_json['orders']:
    d = {}
    for k, v in x.items():
        if isinstance(v, dict) and k =='line_items':
            d[k] = [v]
        else:
            d[k] = v
        if k !='line_items':
            keys.append(k)
    L.append(d)

print (L)

[
    {
        "id": '1',
        "email": "[email protected]",
        "location_id": 9,
        "line_items": [
            {
                "id": 5,
                "product_id": 6,
            }, {
                "id": 7,
                "product_id": 8,
            }
        ]
    }, {
        "id": '2',
        "email": "[email protected]",
        "location_id": 10,
        "line_items": [{
            "id": 3,
            "product_id": 4,
        }]
    }
]

from pandas.io.json import json_normalize

#get unique keys and pass to json_normalize
L1 = list(set(keys))
print (L1)
['location_id', 'id', 'email']

df = json_normalize(L,  ['line_items'],  L1, record_prefix='line_items_')
print (df)
   line_items_id  line_items_product_id  location_id id       email
0              5                      6            9  1  [email protected]
1              7                      8            9  1  [email protected]
2              3                      4           10  2  [email protected]
Sign up to request clarification or add additional context in comments.

5 Comments

This is the question, I dont want to explicitly define the column name(id, email), I want all the columns provided in the json response, it may be 1000 thats why I dont want to define it in the code.
@NikhilGupta - Can you check edit? You can get dynamically all values in list and pass to json_normalize
yes it worked, thanks man. this will be the last question. what if in some cases line_items is dictionary instead of list. how to troubleshoot that?
@NikhilGupta - It is list if multiple values and dict if one value like in question, not like in my data?
thanks buddy, it worked. just a small update, keys variable is undefined. it needs to be initialized and k should be appended to it in the for loop. Thank you so much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.