1

I have a very large JSON object that I need to split into smaller objects and write those smaller objects to file.

Sample Data

raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

Desired Output (In this example, split the data in half)

output_file1.json = [{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202}]

output_file2.json = [{"id":"222","num":"4182","count":12}{"id":"33333","num":"5182","count":12}]

Current Code

import pandas as pd
import itertools
import json
from itertools import zip_longest


def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

    raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

#split the data into manageable chunks + write to files

for i, group in enumerate(grouper(raw, 4)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)

Current Output of first file "outputbatch_0.json"

["[", "{", "\"", "s"]

I feel like I'm making this much harder than it needs to be.

1
  • 2
    Your raw string isn't valid JSON (missing commas between objects). Is this the case with your real data or just a typo in the question? Commented Sep 18, 2018 at 14:14

3 Answers 3

2

assuming the raw should be a valid json string (I included the missing commas), here is a simple, but working solution.

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)

def split_in_files(json_data, amount):
    step = len(json_data) // amount
    pos = 0
    for i in range(amount - 1):
        with open('output_file{}.json'.format(i+1), 'w') as file:
            json.dump(json_data[pos:pos+step], file)
            pos += step
    # last one
    with open('output_file{}.json'.format(amount), 'w') as file:
        json.dump(json_data[pos:], file)

split_in_files(json_data, 2)
Sign up to request clarification or add additional context in comments.

Comments

0

if raw is valid json. the saving part is not detailed.

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

raw_list = eval(raw)
raw__zipped = list(zip(raw_list[0::2], raw_list[1::2]))

for item in raw__zipped:
    with open('a.json', 'w') as f:
        json.dump(item, f)

Comments

0

If you need the exactly half of the data you can use slicing:

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)

size_of_half = len(json_data)/2

print json_data[:size_of_half]
print json_data[size_of_half:]

In shared code basic cases are not handled like what if length is odd etc, In short You can do everything that you can do with list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.