1

Well I have a file containing unicode "û". This is however not read correctly as shown in the following test case:

print("û")
with open(r"testfile.txt") as f:
    for line in f:
        print(line)

Which outputs:

û
û

The IDE can correctly display the character - yet from reading the file another character is shown. If I execute it in the debugger I see that f has as "encoding" cp1252. Not unicode.

So how would I "fix" this?

Opening the file in notepad++ tells me the file really is UTF-8. If I manually change the file to be windows-codepage 1252 it seems to work. But that's not really what I want.

1
  • Per default open uses the encoding returned by locale.getpreferredencoding which basically defaults to cp1252 on windows. Commented Oct 13, 2017 at 13:57

2 Answers 2

3

You can specify the encoding when opening the file:

with open(r"testfile.txt", encoding='utf-8') as f:
Sign up to request clarification or add additional context in comments.

5 Comments

TypeError: 'encoding' is an invalid keyword argument for this function. Can I ask what version of Python you are using? Is this a Python 3 thing?
yes I think so.
Okay, thanks. I guess I'll have to use the codecs library then.
:D Python3 is calling you...
Lol. The newer sites that I maintain use it. Just not the one that I am working on now. Eventually, the day will come.
1

You will need to use the encoding parameter as "utf-8", while opening the file. that looks like below along with the with open(). You may want to read up on this more here

  encoding='utf-8'

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.