Reading unicode files in python [duplicate]

Question

Well I have a file containing unicode "û". This is however not read correctly as shown in the following test case:

print("û")
with open(r"testfile.txt") as f:
    for line in f:
        print(line)

Which outputs:

û
Ã»

The IDE can correctly display the character - yet from reading the file another character is shown. If I execute it in the debugger I see that f has as "encoding" cp1252. Not unicode.

So how would I "fix" this?

Opening the file in notepad++ tells me the file really is UTF-8. If I manually change the file to be windows-codepage 1252 it seems to work. But that's not really what I want.

Per default open uses the encoding returned by locale.getpreferredencoding which basically defaults to cp1252 on windows. — syntonym
– syntonym, Commented Oct 13, 2017 at 13:57

Ben · Accepted Answer · 2017-10-13 13:50:15Z

3

You can specify the encoding when opening the file:

with open(r"testfile.txt", encoding='utf-8') as f:

answered Oct 13, 2017 at 13:50

Ben

6,4834 gold badges38 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

tklodd Over a year ago

TypeError: 'encoding' is an invalid keyword argument for this function. Can I ask what version of Python you are using? Is this a Python 3 thing?

Ben Over a year ago

yes I think so.

tklodd Over a year ago

Okay, thanks. I guess I'll have to use the codecs library then.

Ben Over a year ago

:D Python3 is calling you...

tklodd Over a year ago

Lol. The newer sites that I maintain use it. Just not the one that I am working on now. Eventually, the day will come.

Chetan_Vasudevan · Accepted Answer · 2017-10-13 13:51:17Z

1

You will need to use the encoding parameter as "utf-8", while opening the file. that looks like below along with the with open(). You may want to read up on this more here

  encoding='utf-8'

answered Oct 13, 2017 at 13:51

Chetan_Vasudevan

2,4142 gold badges15 silver badges35 bronze badges

Collectives™ on Stack Overflow

Reading unicode files in python [duplicate]

2 Answers 2

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Linked

Related