2

I have a task to transform data from csv file to nested JSON file using python for web usage. I tried the Python code in this article. The desired output would be one member_id shows one time in the JSON file, same for the tag_name under one member_id. The problem is when I tried groupby with only member_id, the tag_name 'm1' would show multiple times under 'abc123' for example. If I tried groupby with both member_id and tag_name, 'abc123' would show up two times for tag 'm1' and 'm2'. I've been googling for a while but most resolutions only for one nested(not sure if I use the right term). Please let me know if any possible way to do this.

Sample Code:

import json
import pandas as pd
df = pd.read_csv('../detail.csv', sep=',', header=0
                 , index_col=False
                 , dtype = {'member_id':str,'tag_name':str,'detail_name':str,'detail_value':str} )
group = df.groupby(['member_id','tag_name'])

finalList, finalDict = [], {}
for key, value in group:
    dictionary, dictionary1, dictList, dictList1 = {}, {}, [], []
    j = group.get_group(key).reset_index(drop=True)
    dictionary['member_id'] = j.at[0,'member_id']
    dictionary1['tag_name'] = j.at[0,'tag_name']
    
    for i in j.index:
        anotherDict = {}
        anotherDict['detail_name'] = j.at[i,'detail_name']
        anotherDict['detail_value'] = j.at[i,'detail_value']
        dictList1.append(anotherDict.copy())
        dictionary1['detail'] = dictList1 
     
    dictList.append(dictionary1)
    dictionary['tag'] = dictList
    finalList.append(dictionary)

json.dumps(finalList,ensure_ascii = False)

detail.csv:

member_id, tag_name, detail_name, detail_value
-------------------------------------------------------
abc123, m1, Service_A, 20
abc123, m1, Service_B, 20
abc123, m2, Service_C, 10
xyz456, m3, Service A, 5
xyz456, m3, Service A, 10

Desired Output JSON:

{   "member_id": "abc123",
    "tag":[ {"tag_name": "m1",
            "detail":[{ "detail_name": "Service_A",
                        "detail_value": "20"},
                    {   "detail_name": "Service_B",
                        "detail_value": "20"}]},
            {"tag_name": "m2",
            "detail":[{ "detail_name": "Service_C",
                        "detail_value": "10"}]}]},
{   "member_id": "xyz456",
    "tag":[{"tag_name": "m3",
            "detail":[{ "detail_name": "Service_A",
                        "detail_value": "5"},
                      { "detail_name": "Service_A",
                        "detail_value": "10"}]}]}
0

1 Answer 1

1

I'm not aware of pandas functions allowing to do achieve that directly. Moreover, you introduce keys that are not part of your initial dataframe (tag, detail). So implementing a generic solution seems difficult.

However, if you don't have more columns than stated in your question, you can go through your dataframe, grouping column after column:

result = []

for member_id, member_df in df.groupby('member_id'):
    member_dict = {'member_id': member_id}
    member_dict['tag'] = []
    for tag_name, tag_df in member_df.groupby('tag_name'):
        tag_dict = {'tag_name': tag_name}
        tag_dict['detail'] = []
        for detail_name, detail_df in tag_df.groupby('detail_name'):
            detail_dict = {'detail_name': detail_name}
            detail_dict['detail_value'] = detail_df.detail_value.mean() # should be only one value, taking 'mean' just in case
            tag_dict['detail'].append(detail_dict)
        member_dict['tag'].append(tag_dict)
    result.append(member_dict)

print(json.dumps(result, indent=4))

Output:

[
    {
        "member_id": "abc123",
        "tag": [
            {
                "tag_name": "m1",
                "detail": [
                    {
                        "detail_name": "Service_A",
                        "detail_value": 20.0
                    },
                    {
                        "detail_name": "Service_B",
                        "detail_value": 20.0
                    }
                ]
            },
            {
                "tag_name": "m2",
                "detail": [
                    {
                        "detail_name": "Service_C",
                        "detail_value": 10.0
                    }
                ]
            }
        ]
    },
    {
        "member_id": "xyz456",
        "tag": [
            {
                "tag_name": "m3",
                "detail": [
                    {
                        "detail_name": "Service A",
                        "detail_value": 5.0
                    }
                ]
            }
        ]
    }
]
Edit: if you don't want unique detail names in your lists it's shorter:
result = []

for member_id, member_df in df.groupby('member_id'):
    member_dict = {'member_id': member_id}
    member_dict['tag'] = []
    for tag_name, tag_df in member_df.groupby('tag_name'):
        tag_dict = {'tag_name': tag_name}
        tag_dict['detail'] = tag_df[['detail_name', 'detail_value']].to_dict(orient='records')
        member_dict['tag'].append(tag_dict)
    result.append(member_dict)

print(json.dumps(result, indent=4))
Sign up to request clarification or add additional context in comments.

7 Comments

Tranbi, is there any way to improve the performance? the csv might have ten millions rows probably.
I can't think of a way to greatly improve performance. But do you really need that big of a json? you could save each member_dict to its own json file. That would be easier on memory.
Understood. just one more thing, if I have a new row like 'xyz456, m3, Service A, 10', and I want them show just in different dict like Service A, Service B under (abc123, m1), how should I modify the code?
What do you mean? xyz456 is already in a different dict
I updated the sample csv and output above, it should be clear. I don't want the detail be grouped by. so I can see all the detail even they're under same detail_name.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.