Python: UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 9 years ago.

Improve this question

I'm parsing a CSV as follows:

with open(args.csv, 'rU') as csvfile:
        try:
            reader = csv.DictReader(csvfile, dialect=csv.QUOTE_NONE)
            for row in reader:
            ...

where args.csv is the name of my file. One of the rows in my file is an e with two dots on top. My script breaks when it encounters this.

I get the following stack trace:

File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 244, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)

and the following error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 5: invalid start byte

FWIW, I'm running Python 2.7 and upgrading isn't an option (for a few reasons).

I'm pretty lost about how to fix this so any help is much appreciated.

Thanks!

What if you try with open(args.csv, 'rU', encoding='utf-8') as csvfile: ? — DeepSpace
– DeepSpace, Commented Jun 24, 2016 at 17:53
You could add some data from the csv file maybe as hexdump. Could it be the file is not meaningfully interpretable as utf8 because it was encoded to bytes from some windows or other encodings? — Dilettant
– Dilettant, Commented Jun 24, 2016 at 17:57
The error doesn't come from the code, it comes from call to json.dumps — Antti Haapala
– Antti Haapala, Commented Jun 24, 2016 at 20:27

C. K. Young · Accepted Answer · 2016-06-24 17:57:06Z

11

Byte 0x91 is a "smart" opening single quote in Windows-1252 encoding. So it sounds like that's the encoding your file is using, not UTF-8. So, use open(args.csv, 'rU', encoding='windows-1252').

answered Jun 24, 2016 at 17:57

C. K. Young

224k47 gold badges394 silver badges446 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anon_swe Over a year ago

When I follow your answer, I get: "TypeError: 'encoding' is an invalid keyword argument for this function". Fwiw, I'm running Python 2.7 and (for a few reasons) can't change that.

DeepSpace Over a year ago

@bclayman It is preferable that you mention that in your question, even though it is mentioned in the stacktrace.

Boris Treukhov Over a year ago

Great answer! I managed to convert a file in Uzbek language to UTF-8 iconv -t UTF-8 -f Windows-1252 in.xml I would've spent a lot of time guessing what 0x91 and 0x92 character mean.

Collectives™ on Stack Overflow

Python: UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 [closed]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related