I am attempting to create a JSON file from a CSV using Pandas
CSV File This is just an excerpt, sorry for the long table, but I wanted to show things more clearly.
| Month | Type | Subtype | ItemName | |
|---|---|---|---|---|
| December | ObjectTypeA | SubType A1 | Item 1 | |
| December | ObjectTypeA | SubType A1 | Item 2 | |
| December | ObjectTypeA | SubType A2 | Item 3 | |
| December | ObjectTypeA | SubType A2 | Item 4 | |
| December | ObjectTypeA | SubType A2 | Item 5 | |
| December | ObjectTypeA | SubType A3 | Item 6 | |
| December | ObjectTypeA | SubType A3 | Item 7 | |
| December | ObjectTypeA | SubType A4 | Item 8 | |
| December | ObjectTypeA | SubType A4 | Item 9 | |
| December | ObjectTypeA | SubType A4 | Item 10 | |
| December | ObjectTypeA | SubType A4 | Item 11 | |
| December | ObjectTypeA | SubType A4 | Item 12 | |
| December | ObjectTypeA | SubType A5 | Item 13 | |
| December | ObjectTypeA | SubType A5 | Item 14 | |
| December | ObjectTypeA | SubType A5 | Item 15 | |
| December | ObjectTypeB | SubType B1 | Item 16 | |
| December | ObjectTypeB | SubType B1 | Item 17 | |
| December | ObjectTypeB | SubType B2 | Item 18 | |
| December | ObjectTypeB | SubType B2 | Item 19 | |
| December | ObjectTypeB | SubType B2 | Item 20 | |
| December | ObjectTypeB | SubType B3 | Item 21 | |
| December | ObjectTypeB | SubType B3 | Item 22 | |
| March | ObjectTypeA | SubType A1 | Item 23 | |
| March | ObjectTypeA | SubType A1 | Item 24 | |
| March | ObjectTypeA | SubType A2 | Item 25 | |
| March | ObjectTypeA | SubType A2 | Item 26 | |
| March | ObjectTypeA | SubType A2 | Item 27 | |
| March | ObjectTypeA | SubType A3 | Item 28 | |
| March | ObjectTypeA | SubType A3 | Item 29 | |
| March | ObjectTypeA | SubType A4 | Item 30 | |
| March | ObjectTypeA | SubType A4 | Item 31 | |
| March | ObjectTypeA | SubType A4 | Item 32 | |
| March | ObjectTypeA | SubType A4 | Item 33 | |
| March | ObjectTypeA | SubType A4 | Item 34 | |
| March | ObjectTypeC | SubType C1 | Item 35 | |
| March | ObjectTypeC | SubType C1 | Item 36 | |
| March | ObjectTypeC | SubType C2 | Item 37 | |
| March | ObjectTypeC | SubType C2 | Item 38 | |
| March | ObjectTypeC | SubType C3 | Item 39 |
Required Output
allobjects: {
"December": {
"Object Type A": {
"Subtype A1": ["Item1","Item2"],
"Subtype A2": ["Item3","Item4","Item5"],
"Subtype A3": ["Item6","Item7"],
"Subtype A4": ["Item8","Item9"],
"Subtype A5": ["Item10","Item11","Item12"]
},
"Object Type B": {
"Subtype B1": ["Item13","Item14"],
"Subtype B2": ["Item16","Item15","Item17","Item18"],
"Subtype B3": ["Item19","Item20"],
"Subtype B4": ["Item21","Item22"],
"Subtype B5": ["Item23","Item24","Item25"]
},
"Object Type C": {
"Subtype C1": ["Item26", "Item27"],
"Subtype C2": ["Item28", "Item29"],
"Subtype C3": ["Item30", "Item31"]
}},
"March": {
"Object Type A": {
"Subtype A1": ["Item32","Item33"],
"Subtype A2": ["Item34","Item35"],
"Subtype A3": ["Item36","Item37"],
"Subtype A4": ["Item38","Item39","Item40"],
"Subtype A5": ["Item41","Item42","Item44"]
},
"Object Type C": {
"Subtype C1": ["Item45", "Item46"],
"Subtype C2": ["Item47", "Item48"],
"Subtype C3": ["Item49", "Ite50"]
}},
},
Current Code
df = pd.read_csv("Book4.csv", dtype={
"Month" : str,
"Type" : str,
"Subtype" : str,
"ItemName": str,
})
compiled = []
for (month, type, subtype), bag in df.groupby(["Month", "Type", "Subtype"]):
contents = bag.drop(["Month", "Type","Subtype"], axis=1)
allitems = [list(row) for i,row in contents.items()]
compiled.append(dict([(month, {}),
(type, {}),
(subtype, allitems),
]))
with open("Book4_pandas.json", 'w') as outfile:
outfile.write(json.dumps(compiled, sort_keys=False, indent=2, separators=(',', ': ') ))
Output from Current Code
[
{
"December": {},
"ObjectTypeA": {},
"Subtype A1": [
[ "Item1",
"Item2"
]
]
},
{
"December": {},
"ObjectTypeA": {},
"Subtype A2": [
[ "Item3",
"Item4",
"Item5"
]
]
},
.......This goes on for december and then
{
"March": {},
"ObjectTypeA": {},
"Subtype A1": [
[ "Item23",
"Item24"
]
]
},
{
"March": {},
"ObjectTypeA": {},
"Subtype A2": [
[ "Item25",
"Item26",
"Item27"
]
]
}
]
I appreciate that the JSON format is non-standard; however, I figured that writing a dict would be one "easy" approach? I believe there is an error in the way the for loop is structured?
Many thanks in advance!