1

I am attempting to create a JSON file from a CSV using Pandas

CSV File This is just an excerpt, sorry for the long table, but I wanted to show things more clearly.

Month Type Subtype ItemName
December ObjectTypeA SubType A1 Item 1
December ObjectTypeA SubType A1 Item 2
December ObjectTypeA SubType A2 Item 3
December ObjectTypeA SubType A2 Item 4
December ObjectTypeA SubType A2 Item 5
December ObjectTypeA SubType A3 Item 6
December ObjectTypeA SubType A3 Item 7
December ObjectTypeA SubType A4 Item 8
December ObjectTypeA SubType A4 Item 9
December ObjectTypeA SubType A4 Item 10
December ObjectTypeA SubType A4 Item 11
December ObjectTypeA SubType A4 Item 12
December ObjectTypeA SubType A5 Item 13
December ObjectTypeA SubType A5 Item 14
December ObjectTypeA SubType A5 Item 15
December ObjectTypeB SubType B1 Item 16
December ObjectTypeB SubType B1 Item 17
December ObjectTypeB SubType B2 Item 18
December ObjectTypeB SubType B2 Item 19
December ObjectTypeB SubType B2 Item 20
December ObjectTypeB SubType B3 Item 21
December ObjectTypeB SubType B3 Item 22
March ObjectTypeA SubType A1 Item 23
March ObjectTypeA SubType A1 Item 24
March ObjectTypeA SubType A2 Item 25
March ObjectTypeA SubType A2 Item 26
March ObjectTypeA SubType A2 Item 27
March ObjectTypeA SubType A3 Item 28
March ObjectTypeA SubType A3 Item 29
March ObjectTypeA SubType A4 Item 30
March ObjectTypeA SubType A4 Item 31
March ObjectTypeA SubType A4 Item 32
March ObjectTypeA SubType A4 Item 33
March ObjectTypeA SubType A4 Item 34
March ObjectTypeC SubType C1 Item 35
March ObjectTypeC SubType C1 Item 36
March ObjectTypeC SubType C2 Item 37
March ObjectTypeC SubType C2 Item 38
March ObjectTypeC SubType C3 Item 39

Required Output

allobjects: {
"December": {
    "Object Type A": {
        "Subtype A1": ["Item1","Item2"],
        "Subtype A2": ["Item3","Item4","Item5"],
        "Subtype A3": ["Item6","Item7"],
        "Subtype A4": ["Item8","Item9"],
        "Subtype A5": ["Item10","Item11","Item12"]
        },
                
    "Object Type B": {
        "Subtype B1": ["Item13","Item14"],
        "Subtype B2": ["Item16","Item15","Item17","Item18"],
        "Subtype B3": ["Item19","Item20"],
        "Subtype B4": ["Item21","Item22"],
        "Subtype B5": ["Item23","Item24","Item25"]
        },
    "Object Type C": {
        "Subtype C1": ["Item26", "Item27"],
        "Subtype C2": ["Item28", "Item29"],
        "Subtype C3": ["Item30", "Item31"]
        }},
"March": {
    "Object Type A": {
        "Subtype A1": ["Item32","Item33"],
        "Subtype A2": ["Item34","Item35"],
        "Subtype A3": ["Item36","Item37"],
        "Subtype A4": ["Item38","Item39","Item40"],
        "Subtype A5": ["Item41","Item42","Item44"]
        },
                
    "Object Type C": {
        "Subtype C1": ["Item45", "Item46"],
        "Subtype C2": ["Item47", "Item48"],
        "Subtype C3": ["Item49", "Ite50"]
        }},
    },

Current Code

df = pd.read_csv("Book4.csv", dtype={
            "Month" : str,
            "Type" : str,
            "Subtype" : str,
            "ItemName": str,
        })


compiled = []

for (month, type, subtype), bag in df.groupby(["Month", "Type", "Subtype"]):
    contents = bag.drop(["Month", "Type","Subtype"], axis=1)
    allitems = [list(row) for i,row in contents.items()]
    compiled.append(dict([(month, {}),
                        (type, {}),
                        (subtype, allitems),
                         ]))
with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False, indent=2, separators=(',', ': ') ))

Output from Current Code

[
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item1",
             "Item2"
           ]
    ]
  },
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item3",
             "Item4",
         "Item5"
           ]
    ]
  },

.......This goes on for december and then

  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item23",
             "Item24"
           ]
    ]
  },
  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item25",
             "Item26",
         "Item27"
           ]
    ]
  }
]

I appreciate that the JSON format is non-standard; however, I figured that writing a dict would be one "easy" approach? I believe there is an error in the way the for loop is structured?

Many thanks in advance!

2 Answers 2

2

You can first create Series filled by lists by aggregation and then in nested dict comprehension create expected ouput:

s = df.groupby(["Month", "Type", "SubType"], sort=False)['ItemName'].agg(list)

compiled = {i: {j[1]: h[j].to_dict() 
                for j, h in g.groupby(level=[0,1], sort=False)}
                for i, g in s.groupby(level=0, sort=False)}

print (compiled)

{
    'December': {
        'ObjectTypeA': {
            'SubType A1': ['Item 1', 'Item 2'],
            'SubType A2': ['Item 3', 'Item 4', 'Item 5'],
            'SubType A3': ['Item 6', 'Item 7'],
            'SubType A4': ['Item 8', 'Item 9', 'Item 10', 'Item 11', 'Item 12'],
            'SubType A5': ['Item 13', 'Item 14', 'Item 15']
        },
        'ObjectTypeB': {
            'SubType B1': ['Item 16', 'Item 17'],
            'SubType B2': ['Item 18', 'Item 19', 'Item 20'],
            'SubType B3': ['Item 21', 'Item 22']
        }
    },
    'March': {
        'ObjectTypeA': {
            'SubType A1': ['Item 23', 'Item 24'],
            'SubType A2': ['Item 25', 'Item 26', 'Item 27'],
            'SubType A3': ['Item 28', 'Item 29'],
            'SubType A4': ['Item 30', 'Item 31', 'Item 32', 'Item 33', 'Item 34']
        },
        'ObjectTypeC': {
            'SubType C1': ['Item 35', 'Item 36'],
            'SubType C2': ['Item 37', 'Item 38'],
            'SubType C3': ['Item 39']
        }
    }
}
    

with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False,
                             indent=2, separators=(',', ': ')))
Sign up to request clarification or add additional context in comments.

Comments

1

Thanks for your question. You can change your code like below :

import pandas as pd
import json

df = pd.read_csv("Book4.csv", dtype={
    "Month": str,
    "Type": str,
    "Subtype": str,
    "ItemName": str,
})


compiled = []

s = df.groupby(["Month", "Type", "Subtype"])['ItemName'].agg(list)

compiled = {level: {le: s.xs((level, le), level=[0, 1]).to_dict()
                    for le in s.index.levels[1]}
            for level in s.index.levels[0]}

with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False,
                             indent=2, separators=(',', ': ')))

1 Comment

I remove this solution from my answer check because wrong ouput. If also copied json output you can see it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.