Python load json file with UTF-8 BOM header

Question

I needed to parse files generated by other tool, which unconditionally outputs json file with UTF-8 BOM header (EFBBBF). I soon found that this was the problem, as Python 2.7 module can't seem to parse it:

>>> import json
>>> data = json.load(open('sample.json'))

ValueError: No JSON object could be decoded

Removing BOM, solves it, but I wonder if there is another way of parsing json file with BOM header?

Python : How to fix Unexpected UTF-8 BOM error when using json.loads — Grijesh Chauhan
– Grijesh Chauhan, Commented Nov 27, 2019 at 9:43

Pavel Anossov · Accepted Answer · 2012-10-31 11:25:36Z

101

You can open with codecs:

import json
import codecs

json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))

or decode with utf-8-sig yourself and pass to loads:

json.loads(open('sample.json').read().decode('utf-8-sig'))

edited Oct 31, 2012 at 11:25

answered Oct 31, 2012 at 11:20

Pavel Anossov

63.3k16 gold badges156 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Martijn Pieters Over a year ago

I strongly recommend using io.open() over codecs.open(): json.load(io.open('sample.json', 'r', encoding='utf-8-sig')). The io module is more robust and faster.

Bdoserror Over a year ago

@MartijnPieters: Thanks for that comment, good to know. I found this discussion of the differences that might be useful: groups.google.com/forum/#!topic/comp.lang.python/s_eIyt3KoLE

John R Perry · Accepted Answer · 2021-04-10 07:21:15Z

46

Simple! You don't even need to import codecs.

with open('sample.json', encoding='utf-8-sig') as f:
    data = json.load(f)

edited Apr 10, 2021 at 7:21

John R Perry

4,2222 gold badges46 silver badges70 bronze badges

answered Jun 6, 2019 at 22:38

aerin

23k33 gold badges116 silver badges149 bronze badges

Comments

Ray Hulha · Accepted Answer · 2019-05-04 06:47:42Z

7

you can also do it with keyword with

import codecs
with codecs.open('samples.json', 'r', 'utf-8-sig') as json_file:  
    data = json.load(json_file)

or better:

import io
with io.open('samples.json', 'r', encoding='utf-8-sig') as json_file:  
    data = json.load(json_file)

edited May 4, 2019 at 6:47

Ray Hulha

11.3k5 gold badges57 silver badges57 bronze badges

answered Mar 30, 2019 at 15:24

Mohamed Ali Mimouni

1411 silver badge4 bronze badges

Comments

newtover · Accepted Answer · 2012-10-31 11:21:33Z

6

Since json.load(stream) uses json.loads(stream.read()) under the hood, it won't be that bad to write a small hepler function that lstrips the BOM:

from codecs import BOM_UTF8

def lstrip_bom(str_, bom=BOM_UTF8):
    if str_.startswith(bom):
        return str_[len(bom):]
    else:
        return str_

json.loads(lstrip_bom(open('sample.json').read()))

In other situations where you need to wrap a stream and fix it somehow you may look at inheriting from codecs.StreamReader.

answered Oct 31, 2012 at 11:21

newtover

32.3k11 gold badges89 silver badges89 bronze badges

4 Comments

Sam Stoelinga Over a year ago

Why not use the string strip function?

newtover Over a year ago

@SamStoelinga, since strip function receives not a prefix but a set of characters to remove. That it you need to either decode the byte-string into the unicode or use the approach above to be sure you left-strip only the UTF-8 BOM.

Zypps987 Over a year ago

I'm getting an error that says TypeError: expected str,bytes or os.Pathlike object, not _io.TextIOWrapper

newtover Over a year ago

@Zypps987, the snippet assumes python2 where read() returns bytes. To make the snippet work in python3 you will need to encode BOM_UTF8 to 'utf-8'. But you don't need this, when you have utf-8-sig encoding.

Mike N · Accepted Answer · 2017-12-04 08:51:45Z

0

If this is a one-off, a very simple super high-tech solution that worked for me...

Open the JSON file in your favorite text editor.
Select-all
Create a new file
Paste
Save.

BOOM, BOM header gone!

answered Dec 4, 2017 at 8:51

Mike N

6,8754 gold badges26 silver badges21 bronze badges

Comments

Rick · Accepted Answer · 2020-03-24 01:01:19Z

0

I removed the BOM manually with Linux command.

First I check if there are efbb bf bytes for the file, with head i_have_BOM | xxd.

Then I run dd bs=1 skip=3 if=i_have_BOM.json of=I_dont_have_BOM.json.

bs=1 process 1 byte each time, skip=3, skip the first 3 bytes.

answered Mar 24, 2020 at 1:01

Rick

7,6643 gold badges57 silver badges96 bronze badges

Comments

Rodrigo Grossi · Accepted Answer · 2020-06-10 11:20:43Z

0

I'm using utf-8-sig just with import json

with open('estados.json', encoding='utf-8-sig') as json_file:
data = json.load(json_file)
print(data)

answered Jun 10, 2020 at 11:20

Rodrigo Grossi

151 bronze badge

Collectives™ on Stack Overflow

Python load json file with UTF-8 BOM header

7 Answers 7

2 Comments

Comments

Comments

4 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

Comments

Comments

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related