What encoding does open() use by default?

Question

I tried using some code like this to read a JSON file (encoded using UTF-8):

input = open("json/world_bank.json")
i=0
for l in input:
    i+=1
print(i)

But I got a UnicodeDecodeError. However, it started working once I tried explicitly specifying an encoding:

input = open("json/world_bank.json",encoding="utf8")

I thought the open function would use "utf8" as the default encoding? Why does it need to be specified?

What does sys.getfilesystemencoding() return on your system? — marcelm
– marcelm, Commented Mar 30, 2016 at 9:47
Ah hmm, that doesn't tell me too much; could you check open("json/world_bank.json").encoding as well? — marcelm
– marcelm, Commented Mar 30, 2016 at 11:29

Alastair McCormack · Accepted Answer · 2023-03-16 18:05:43Z

90

The default UTF-8 encoding of Python 3 only extends to conversions between bytes and str types. open() instead chooses an appropriate default encoding based on the environment:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

For example, a Windows machine with a Western Europe/North America locale will normally use the 8-bit Windows-1252 character set (Python calls this encoding 'cp1252').

edited Mar 16, 2023 at 18:05

answered Mar 30, 2016 at 18:02

Alastair McCormack

28k8 gold badges81 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Jeyekomon Over a year ago

Fortunately there are recent attempts to end this madness... someday.

Stefan Berger Over a year ago

3.9 is installed on my machine and it's still using Windows 1252 encoding. PEP 597 linked by @Jeyekomon now says Python 3.10.

zumalifeguard Over a year ago

This is a good example of a very bad decision that was made a long time ago. Why give the pretense of being cross-platform when things that do not have to be cross-platform are not cross-platform by default.

Jan Derk Over a year ago

The madness will finally be ended in Python 3.15. PEP 686: Make UTF-8 mode default has been accepted.

NeilG Over a year ago

The madness will finally be ended with the demise of Windows. sys.getfilesystemencoding() # 'utf-8' locale.getpreferredencoding() # 'cp1252'

|

Karl Knechtel · Accepted Answer · 2023-03-16 01:36:50Z

8

Following the advice here, the problem can also be solved by setting the environment variable PYTHONUTF8=1. This causes open to use UTF-8 encoding by default, rather than the platform's default encoding.

edited Mar 16, 2023 at 1:36

Karl Knechtel

61.4k14 gold badges133 silver badges193 bronze badges

answered Dec 20, 2022 at 13:32

walkslowly

4971 gold badge5 silver badges17 bronze badges

3 Comments

Alastair McCormack Over a year ago

This is called "UTF-8 Mode", which forces Python to ignore local environment locales. See docs.python.org/3/library/os.html#utf8-mode. As always, it's better to fix the root cause by setting the correct locale, which should lead to a healthier system.

Karl Knechtel Over a year ago

@AlastairMcCormack I would say it's better to fix the root cause by specifying the encoding in the program. There are any number of reasons why the file the program needs to read would be in a different encoding from the one described in the "correct locale". As they say, explicit is better than implicit.

Alastair McCormack Over a year ago

@KarlKnechtel ah...well...yes and no 😀. I'm not saying that you shouldn't explicitly set the encoding when opening a known file type (as the OP did). My point was that overriding Python's locale detection by using "utf8-mode" is unwise as you'll lose two important features: 1) Terminal/console encoding detection. This was particularly important on Windows consoles. 2) A sensible default open encoding. On Windows machines, this is more important, where text files written by local MS apps will be encoded using the local 8-bit codepage.

Collectives™ on Stack Overflow

What encoding does open() use by default?

2 Answers 2

7 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related