read a file with a stream of json objects in python

Question

This may be redundant, but after reading previous posts and answers I still have not gotten my code to work. I have a very large file containing multiple json objects that are not delimited by any values:

{"_index": "1234", "_type": "11", "_id": "1234", "_score": 0.0, "fields": {"c_u": ["url.com"], "tawgs.id": ["p6427"]}}{"_index": "1234", "_type": "11", "_id": "786fd4ad2415aa7b", "_score": 0.0, "fields": {"c_u": ["url2.com"], "tawgs.id": ["p12519"]}}{"_index": "1234", "_type": "11", "_id": "5826e7cbd92d951a", "_score": 0.0, "fields": {"tawgs.id": ["p8453", "p8458"]}}

I've read that this is exactly what JSON-RPC is supposed to look like, but still can't achieve opening/parsing the file to create a dataframe in python.

I tried something of the format of:

i = 0
d = json.JSONDecoder()
while True:
    try:
        obj, i = d.raw_decode(s, i)
    except ValueError:
        return
    yield obj

but it didn't work.

I've also tried a basic:

with open('output.json','r') as f:
    data = json.load(f)

but am thrown the error:

JSONDecodeError: Extra data: line 1 column 184 (char 183)

Trying json.decode() with append didn't work either and returned data empty []

data = []
with open('es-output.json', 'r') as f:
    for line in f:
        try:
            data.append(json.loads(line))
        except json.decoder.JSONDecodeError:
            pass # skip this line

Possible duplicate of multiple Json objects in one file extract by python — Error - Syntactical Remorse
– Error - Syntactical Remorse, Commented Jul 17, 2019 at 17:17
Does this answer your question? how to analyze json objects that are NOT separated by comma (preferably in Python) — ggorlen
– ggorlen, Commented Oct 30, 2022 at 23:04

Andrej Kesely · Accepted Answer · 2019-07-17 18:15:04Z

3

This will try to decode the JSON stream inside s iteratively:

s = '''{"_index": "1234", "_type": "11", "_id": "1234", "_score": 0.0, "fields": {"c_u": ["url.com"], "tawgs.id": ["p6427"]}}{"_index": "1234", "_type": "11", "_id": "786fd4ad2415aa7b", "_score": 0.0, "fields": {"c_u": ["url2.com"], "tawgs.id": ["p12519"]}}{"_index": "1234", "_type": "11", "_id": "5826e7cbd92d951a", "_score": 0.0, "fields": {"tawgs.id": ["p8453", "p8458"]}}'''

import json

d = json.JSONDecoder()

idx = 0
while True:
    if idx >= len(s):
        break
    data, i = d.raw_decode(s[idx:])
    idx += i
    print(data)
    print('*' * 80)

Prints:

{'_index': '1234', '_type': '11', '_id': '1234', '_score': 0.0, 'fields': {'c_u': ['url.com'], 'tawgs.id': ['p6427']}}
********************************************************************************
{'_index': '1234', '_type': '11', '_id': '786fd4ad2415aa7b', '_score': 0.0, 'fields': {'c_u': ['url2.com'], 'tawgs.id': ['p12519']}}
********************************************************************************
{'_index': '1234', '_type': '11', '_id': '5826e7cbd92d951a', '_score': 0.0, 'fields': {'tawgs.id': ['p8453', 'p8458']}}
********************************************************************************

answered Jul 17, 2019 at 18:15

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

aesthetics Over a year ago

so if my "s" value is a json file that isn't a string object because I used json.dump() to write the file, how would I initially convert my file to become a json string type? when I try to use json.dumps() to write my file, I get back an empty set

Andrej Kesely Over a year ago

@aesthetics Just load the content of the file with JSON objects inside s: s = open('your_file.txt', 'r').read()

Andrej Kesely Over a year ago

@aesthetics I don't have any experience with elasticsearch, but you either have some string that is containing JSON values or you will need to load this string from file.

aesthetics Over a year ago

as a next step, and just for clarification, would there be a way to flatten the json if the type is a string to get it into a dataframe?

Andrej Kesely Over a year ago

@aesthetics That's question for Panda/Numpy specialists, but I bet there are some methods for loading data directly from json. You may open other question, these comments are not suitable for it.

Mohamed Bouaziz · Accepted Answer · 2019-07-17 17:28:59Z

0

The problem is in the data itself! In this data you use 3 values but without keys.

The first one is :

{"_index".... ["p6427"]}}

The second one is :

{"_index".... ["p12519"]}}

The third one is :

{"_index".... ["p8458"]}}

You'd rather affect to each value a key, for example :

{
"k1":{"_index": "1234", "_type": "11", "_id": "1234", "_score": 0.0, "fields": {"c_u": ["url.com"], "tawgs.id": ["p6427"]}},

"k2":{"_index": "1234", "_type": "11", "_id": "786fd4ad2415aa7b", "_score": 0.0, "fields": {"c_u": ["url2.com"], "tawgs.id": ["p12519"]}},

"k3":{"_index": "11_20190714_184325_01", "_type": "11", "_id": "5826e7cbd92d951a", "_score": 0.0, "fields": {"tawgs.id": ["p8453", "p8458"]}}
}

This way everything will work well and data will be well loaded.

answered Jul 17, 2019 at 17:28

Mohamed Bouaziz

212 bronze badges

3 Comments

aesthetics Over a year ago

hmm, I am pulling my data down from elasticsearch-py so I am not sure how to manipulate and introduce keys? Also very novice with integrating python and elasticsearch :/

Mohamed Bouaziz Over a year ago

Try to identify some unique charactiritics to create keys!

aesthetics Over a year ago

how am I able to create keys in the json if I am unable to intially read/open the file that contains the data?

Collectives™ on Stack Overflow

read a file with a stream of json objects in python

2 Answers 2

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related