Your string is already in utf-8. You need to decode it to Unicode in order to use it inside Python:
print 'Schutzt\xc3\xbcren'.decode("utf-8")
But you have a bigger problem: You are clearly using Python 2. Switch to Python 3 immediately, there is no reason to drive yourself crazy trying to understand the Python 2 approach to handling character encodings. Switch to Python 3 and you will not have to bang your head against your desk several times a day. (Note that although you were calling the encode() method, you got a
UnicodeDecodeError.
A simple explanation:
- In Python,
unicode and utf-8 are different things. A str in Python 2 might be in the "utf-8" encoding, unicode objects have no encoding.
- If you try to use a
str for something that requires unicode (e.g., to encode() it), or vice versa, Python 2 will try to implicitly convert it first. Except it doesn't know the encoding of your strings, so it guesses (ascii, in your case). Oops.
- Python2 has a lot of implicit conversions.
But really the reason is simple: You are not using Python 3.
Edit: Since Python 3 is not an option, here is some practical advice:
Unicode sandwich: Convert all text to Unicode as soon as it's read in, work with unicode strings and encode back to a utf8 str only to write it out again.
Pandas should still support the encoding argument to to_csv(), even on Python 2. Use it to write your files in utf8.
For reading a file directly, use codecs.open() instead of plain open() to read files. It accepts the encoding= argument and will give you unicode strings.
''Schutzt\xc3\xbcren''is not ASCII! ASCII codes must be in range 1-127. It is the utf-8 encoded byte string for'Schutztüren'. In the same idea, you encode a unicode string to a byte string with an encoding and decode a byte string to an unicode string..to_csv. I get these ugly strings. How do I get them on disk in a nice format?encoding="utf8"both todf.to_csv()and and when you read your file. Everything will Just Work.