12

For example, in your python shell(IDLE):

>>> a = "\x3cdiv\x3e"
>>> print a

The result you get is:

<div>

but if a is an ascii encoded string:

>>> a = "\\x3cdiv\\x3e" ## it's the actual \x3cdiv\x3e string if you read it from a file
>>> print a

The result you get is:

\x3cdiv\x3e

Now what i really want from a is <div>, so I did this:

>>> b = a.decode("ascii")
>>> print b

BUT surprisingly I did NOT get the result I want, it's still:

\x3cdiv\x3e

So basically what do I do to convert a, which is \x3cdiv\x3e to b, which should be <div>?

Thanks

5
  • WHere are you getting the string "a" from, and how? I suspect something about how you're getting the input is confused. "Decode" in Python refers to converting from 8 bits to full Unicode; it has nothing to do with language-specific escape sequences like backslashes an such. Commented May 11, 2013 at 3:07
  • @LeeDanielCrocker: Read it from a html source file. Commented May 11, 2013 at 3:24
  • That's still not enough information. Where's the code that read it, and where's the input file, and how did the input file get created? There's really no reason to have the backslash-encoded strings in a string that way unless you're doing something unusual. Commented May 11, 2013 at 3:28
  • @LeeDanielCrocker: It's everywhere. Mostly used in javascript, encoded to hide an iframe, in case you are interested: ddecode.com/hexdecoder/… Commented May 11, 2013 at 3:36
  • That page you point to is using Javascript's "unescape" method, which claims to use URL-encoding, but URL-encoding doesn't use the backslash codes. So it's some format unique to Javascript. I can't find it documented anywhere, and in fact some resources I found specifically don't work with the \x notation. You'll still have to be more specific about where you're getting your input. Commented May 11, 2013 at 4:15

2 Answers 2

17
>>> a = rb"\x3cdiv\x3e"
>>> a.decode('unicode_escape')
'<div>'

Also check out some interesting codecs.

Sign up to request clarification or add additional context in comments.

Comments

7

With python 3.x, you would adapt Kabie answer to

a = b"\x3cdiv\x3e"
a.decode('unicode_escape')

or

a = b"\x3cdiv\x3e"
a.decode('ascii')

both give

>>> a
b'<div>'

What is b prefix for ?

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.