0

I'm trying to understand the encoding stuff in python and I think I nearly managed it to understand. So here is some code which I will explain and I would like you to verify my thoughts :)

text = line.decode( encoding )
print "type(text) = %s" % type(text)
iso_8859_1 = text.encode('latin1')
print "type(iso_8859_1) = %s" % type(iso_8859_1)
unicodeStr = text.encode('utf-8')
print "type(unicodeStr) = %s" % type(unicodeStr)

So the first line

text = line.decode( encoding )

does transform a given string given in the encoding "encoding" into a unicode text format of python. Therefore the output is

type(text) = <type 'unicode'>

So now, I using the original text from my file in an utf-8 encoding style and for the rest of my code "text" is a utf-8 text.

Now I want to transform (for what reason ever) the utf-8 text into some other stuff e.g. latin1 which is done by "text.encode('latin1')". The output of my code in that case is

type(iso_8859_1) = <type 'str'>
type(unicodeStr) = <type 'str'>

Now, the only question that remains for me: Why is the type in the two latter cases 'str' and not 'latin1' or 'unicode'. That's what's still unclear to me.

Are the latter strings "iso_8859_1" and "unicodeStr" not encoded in "latin1" or "unicode" resprectivly?

1 Answer 1

1

First, utf8 != unicode.
str is basically a sequence of bytes and encoding is method of interpreting those sequence, and unicode is, well - unicode.
Joel had great post on this subject http://www.joelonsoftware.com/articles/Unicode.html

Sign up to request clarification or add additional context in comments.

2 Comments

After reading the linked article you should know enough to figure out the rest. Start accepting and upvoting people who help you.
Thanks for the link. I now fully understand what's going on!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.