5

I have the following:

u'\x96'

I want to convert it to the following:

'\x96'

Is there any way to do this? str() doesn't work, and when using .encode(...) it changes the encoding. My main goal is to be able to get the following result, so any shortcut to get there would also be accepted:

>>> '\x96'.decode("cp1252")
u'\u2013'

In other words, I have u'\x96' and I want u'\u2013'. Any help would be appreciated.

I'm using Python 2.7.

6
  • Maybe you could decode to ascii rather than a specific ANSI codepage. Commented Aug 10, 2011 at 9:54
  • @David: But then you can't use anything above \x7f. Commented Aug 10, 2011 at 10:06
  • @Ignacio There won't be anything above \x7f! Commented Aug 10, 2011 at 10:08
  • @David: Did you see the \x96 in the question? Commented Aug 10, 2011 at 10:09
  • @Ignacio I mean after the string has been unicode escaped as per mouad's answer. Commented Aug 10, 2011 at 10:14

2 Answers 2

6
u'\x96'.encode('raw_unicode_escape').decode("cp1252")
Sign up to request clarification or add additional context in comments.

1 Comment

This is a bit of a roundabout way. It would have a side-effect of turning any non-ISO-8859-1 character into a \u escape. For example, u'\u00FF\u0100' becomes u'\xff\\u0100'. Maybe you want that; I'd prefer the UnicodeEncodeError, I think.
3

Latin-1 is the encoding that directly maps the first 256 characters of Unicode to their byte values.

>>> u'\x96'.encode('latin-1').decode("cp1252")
u'\u2013'

1 Comment

This would be the usual idiom. latin-1 is an alias for ISO-8859-1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.