How to store the nested json which is in a list to a text file using Python?

Question

I am creating a nested json and i am storing it in a list object. Here is my code which is getting the proper hierarchical json as intended.

Sample Data:

datasource,datasource_cnt,category,category_cnt,subcategory,subcategory_cnt Bureau of Labor Statistics,44,Employment and wages,44,Employment and wages,44

import pandas as pd
df=pd.read_csv('queryhive16273.csv')
def split_df(df):
   for (vendor, count), df_vendor in df.groupby(["datasource", "datasource_cnt"]):
       yield {
           "vendor_name": vendor,
           "count": count,
           "categories": list(split_category(df_vendor))
       }

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": count,
           "subCategories": list(split_subcategory(df_category)),
       }

def split_subcategory(df_category):
   for (subcategory, count), df_subcategory in df_category.groupby(
       ["subcategory", "subcategory_cnt"]
   ):
       yield {
           "count": count,
           "name": subcategory,
             }


abc=list(split_df(df))

abc is containing the data as shown below. This is the intended result.

[{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

Now I am trying to store it into a json file.

with open('your_file2.json', 'w') as f:
    for item in abc:
       f.write("%s\n" % item)
        #f.write(abc)

Here comes the issue. This writes data in this fashion( refer below) which is not a valid json format. If i try to use json dump, it gives "json serialize error"

Could you please help me out here.

{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}

Expected Result :

[{
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [{
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [{
            "count": 44,
            "name": "Employment and wages"
        }]
    }]
}]

Do not print JSON by yourself (DRTW), it is not a good idea, use encoder instead. In this case the standard breaks because you print out single quote instead of double. — jlandercy
– jlandercy, Commented Dec 5, 2018 at 7:35
Is there an answer that fit your request? If so, you should mark it. — jlandercy
– jlandercy, Commented Dec 5, 2018 at 10:11

jlandercy · Accepted Answer · 2018-12-05 07:56:56Z

1

Using your data and PSL json gives me:

TypeError: Object of type 'int64' is not JSON serializable

Which just means some numpy object is living in your nested structure and does not have an encode method to convert it for JSON serialization.

Forcing encode to use string conversion when it lacks in the object itself is enough to make your code works:

import io
d = io.StringIO("datasource,datasource_cnt,category,category_cnt,subcategory,subcategory_cnt\nBureau of Labor Statistics,44,Employment and wages,44,Employment and wages,44")
df=pd.read_csv(d)

abc=list(split_df(df))

import json
json.dumps(abc, default=str)

It returns a valid JSON (but with int converted into str):

'[{"vendor_name": "Bureau of Labor Statistics", "count": "44", "categories": [{"name": "Employment and wages", "count": "44", "subCategories": [{"count": "44", "name": "Employment and wages"}]}]}]'

If it does not suit your needs, then use a dedicated Encoder:

import numpy as np
class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        return json.JSONEncoder.default(self, obj)

json.dumps(abc, cls=MyEncoder)

This returns the requested JSON:

'[{"vendor_name": "Bureau of Labor Statistics", "count": 44, "categories": [{"name": "Employment and wages", "count": 44, "subCategories": [{"count": 44, "name": "Employment and wages"}]}]}]'

Another option is to directly convert your data before encoding:

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": int(count), # Cast here before encoding
           "subCategories": list(split_subcategory(df_category)),
       }

edited Dec 5, 2018 at 7:56

answered Dec 5, 2018 at 7:27

jlandercy

11.6k3 gold badges49 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Shankar Panda Over a year ago

how to write these to a text file ? Could you please write that line in your json.dump

jlandercy Over a year ago

@ShankarPanda Just use dump instead of dumps as @Buran did in its answer.

buran · Accepted Answer · 2018-12-05 07:35:42Z

0

import json

data = [{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

with open('your_file2.json', 'w') as f:
    json.dump(data, f, indent=2)

produces a valid JSON file:

[
  {
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [
      {
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [
          {
            "count": 44,
            "name": "Employment and wages"
          }
        ]
      }
    ]
  }
]

answered Dec 5, 2018 at 7:35

buran

14.4k13 gold badges45 silver badges76 bronze badges

5 Comments

jlandercy Over a year ago

This will not work with the trial dataset because it has numpy.int64 instead of int within its structure. You skipped this part because you imported it as a Python structure not from the file.

buran Over a year ago

@jlandercy I am using what OP is provided as abc value in their post. They say "abc is containing the data as shown below. This is the intended result.". I don't see where do you get any other trial dataset. Clearly their problem is because they iterate over the list and write each element as text in a simple txt file and produce invalid json.

jlandercy Over a year ago

Look to my answer, I did find the relevant data in the OP. You solution will not work with its dataset: copy paste StringIO and you will be able to reproduce the issue. This does not mean your answer is wrong, it just will not solve OP issue.

buran Over a year ago

@jlandercy, I see it now - it's because they use pandas to read csv in a dataframe. Eventually they can convert count to int when yield and it should solve the issue

jlandercy Over a year ago

Yes, this where numpy.int64 comes from, already suggested. Have a good day.

Collectives™ on Stack Overflow

How to store the nested json which is in a list to a text file using Python?

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related