Python cannot decode json file, although it seems valid

Question

I am trying to load and read a json file with this code:

try:
    json_data = open('sample3.json')
    data = load(json_data)
    json_data.close()
    insert_data(data)
except Exception as e:
    print "Finished with error %s" % (repr(e))

This is the Json file:

{"competitions":
    [
    {"name":"Premiership","nation":"ENG","id":32711,"matches": 
        [
        {"id":7245940,"when":"28.02.2015 12:45",
            "home_team": {"id":430934, "name":"West Ham"},
            "away_team": {"id":430936, "name":"Crystal Palace"},
            "played":1,
            "play_off":0,
            "round":27
                ,"score":{"t1_score":1,"t2_score":3 },
            "score_ht":{"t1_score":0,"t2_score":1}
        }
        ]
    }
    ]
}

and this is the error I am getting: Finished with error ValueError('No JSON object could be decoded',)

I tried file in JSONlint and it says it is valid.

What am I doing wrong?

UPDATE: this is the output of print repr(json_data.read())

'\xef\xbb\xbf{"competitions":\n    [\n    {"name":"Premiership","nation":"ENG","id":32711,"matches": \n        [\n        {"id":7245940,"when":"28.02.2015 12:45",\n            "home_team": {"id":430934, "name":"West Ham"},\n            "away_team": {"id":430936, "name":"Crystal Palace"},\n            "played":1,\n            "play_off":0,\n            "round":27\n                ,"score":{"t1_score":1,"t2_score":3 },\n            "score_ht":{"t1_score":0,"t2_score":1}\n        }\n        ]\n    }\n    ]\n}\n'
Finished with error ValueError('No JSON object could be decoded',)

Are you 100% certain you are opening the correct file? What does print repr(json_data.read()) produce? — Martijn Pieters
– Martijn Pieters, Commented Mar 12, 2015 at 12:00
I can't find a way. I opened the file in vim and it doesn't show anything. set list command just shows a $ at the end of the last line. — xpanta
– xpanta, Commented Mar 12, 2015 at 12:17

Martijn Pieters · Accepted Answer · 2015-03-12 13:17:35Z

6

Your JSON file starts with a UTF-8 BOM (Byte Order Mark) character; JSON doesn't support such a character. It is usually added by Microsoft tools (such as Notepad), to detect encodings, but the characters carry no meaning in UTF-8 since there is no byte order variation.

You'll have to skip these bytes directly, as even using the utf-8-sig encoding doesn't help here.

You can use codecs.BOM_UTF8 to detect it:

import codecs

with open('sample3.json') as json_data:
    bom_maybe = json_data.read(3)
    if bom_maybe != codecs.BOM_UTF8:
        # no BOM at the start, rewind
        json_data.seek(0)
    data = load(json_data)
insert_data(data)

Alternatively, use io.open() to load and decode the data, before passing it to json.loads() instead:

import io

with io.open('sample3.json', encoding='utf-8-sig') as json_data:
    data = json.loads(json_data.read())

Demo:

>>> import codecs
>>> import json
>>> open('/tmp/test.json', 'wb').write('\xef\xbb\xbf{"competitions":\n    [\n    {"name":"Premiership","nation":"ENG","id":32711,"matches": \n        [\n        {"id":7245940,"when":"28.02.2015 12:45",\n            "home_team": {"id":430934, "name":"West Ham"},\n            "away_team": {"id":430936, "name":"Crystal Palace"},\n            "played":1,\n            "play_off":0,\n            "round":27\n                ,"score":{"t1_score":1,"t2_score":3 },\n            "score_ht":{"t1_score":0,"t2_score":1}\n        }\n        ]\n    }\n    ]\n}\n')
>>> with open('/tmp/test.json') as json_data:
...     bom_maybe = json_data.read(3)
...     if bom_maybe != codecs.BOM_UTF8:
...         json_data.seek(0)
...     data = json.load(json_data)
... 
>>> data
{u'competitions': [{u'id': 32711, u'matches': [{u'score_ht': {u't2_score': 1, u't1_score': 0}, u'home_team': {u'id': 430934, u'name': u'West Ham'}, u'away_team': {u'id': 430936, u'name': u'Crystal Palace'}, u'played': 1, u'when': u'28.02.2015 12:45', u'round': 27, u'score': {u't2_score': 3, u't1_score': 1}, u'play_off': 0, u'id': 7245940}], u'name': u'Premiership', u'nation': u'ENG'}]}
>>> with io.open('/tmp/test.json', encoding='utf-8-sig') as json_data:
...     data = json.loads(json_data.read())
... 
>>> data
{u'competitions': [{u'id': 32711, u'matches': [{u'score_ht': {u't2_score': 1, u't1_score': 0}, u'home_team': {u'id': 430934, u'name': u'West Ham'}, u'away_team': {u'id': 430936, u'name': u'Crystal Palace'}, u'played': 1, u'when': u'28.02.2015 12:45', u'round': 27, u'score': {u't2_score': 3, u't1_score': 1}, u'play_off': 0, u'id': 7245940}], u'name': u'Premiership', u'nation': u'ENG'}]}

edited Mar 12, 2015 at 13:17

answered Mar 12, 2015 at 12:10

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

xpanta Over a year ago

Thanks. Now the junk is removed. I still get the same error message.

Martijn Pieters Over a year ago

@xpanta: how did you remove the bytes? The editor that you are using is probably still including them when saving the file! Instead, detect if the first 3 bytes form the BOM so you can skip them.

Nikos M. Over a year ago

@xpanta, the BOM characters are there in your message as well "\xef\xbb\xbf", i did not notice them myself

Martijn Pieters Over a year ago

@NikosM.: I asked the OP to add the output of repr(json_data.read()) to their question, because I suspected a BOM might be involved.

Collectives™ on Stack Overflow

Python cannot decode json file, although it seems valid

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related